IARC 60th Anniversary - 19-21 May 2026
Session : 20/05/26 - Posters
Leveraging Automation and Analytics to Strengthen Population-Based Cancer Registries in a Middle-Income Setting: An Interinstitutional Experience in Colombia
PARDO C. 1, PORTILLA N. 2, COLLAZOS P. 2, GRILLO E. 2, GARCIA L. 2, CORTEZ A. 2
1 National Cancer Institute, Bogota, Colombia; 2 Universidad del Valle, Cali, Colombia
Background: Population-Based Cancer Registries (PBCRs) are essential for monitoring cancer incidence, survival, and outcomes, and for informing public health policies. However, fragmentation of data sources, heterogeneity in reporting formats, and limited automation in data extraction processes compromise data quality, timeliness, and completeness, leading to underreporting and biased epidemiological estimates in many low- and middle-income settings.
Objective: To optimize information extraction methods to improve data quality assurance in the Population-Based Cancer Registries of Barranquilla and Neiva, Colombia, through system integration, automation, and capacity building.
Methods: An interinstitutional project was implemented between the National Cancer Institute of Colombia and Universidad del Valle. The methodological approach included: (i) a situational diagnosis of data sources, workflows, and extraction practices; (ii) integration of national cancer-related information systems, including National Public Health Surveillance System (SIVIGILA), the system for monitoring high-cost diseases, including cancer (Cuenta de Alto Costo), and national mortality; (iii) automation of data extraction from structured and unstructured sources, mainly cancer pathology reports (PDFs, Word documents, and electronic files) using Python-based scripts and standardized templates; (iv) implementation of relational databases using PostgreSQL for consolidated data storage; (v) technical training through virtual workshops and on-site visits focused on database management, automated extraction, and interoperability; and (vi) implementation of real-time monitoring dashboards using Power BI® to evaluate data traceability and quality. Data quality was assessed using standard dimensions of validity, completeness, and timeliness.
Results: Both registries achieved substantial methodological and operational improvements. The Neiva PBCR completed full characterization of institutional sources, implemented functional relational databases, and deployed automated extraction scripts, achieving compliance levels above 85% across evaluated indicators. Dynamic dashboards enabled real-time monitoring of data flows, consistency checks, and management indicators. The Barranquilla PBCR consolidated a comprehensive census of oncology service providers, reached advanced stages of automation (>90%) in data extraction processes, and strengthened real-time traceability through interactive visualization tools. Across both settings, automation significantly reduced manual processing time, improved consistency between sources, and enhanced integration of historical and newly reported cases. The project also facilitated interinstitutional collaboration and alignment with national surveillance systems.
Conclusions: The implementation of standardized, automated data extraction methodologies combined with relational databases, analytical dashboards, and targeted capacity building substantially improves data quality, efficiency, and sustainability in Population-Based Cancer Registries. This model demonstrates scalability and adaptability to other regions and supports the strengthening of national cancer surveillance systems by enabling more timely, reliable, and comprehensive epidemiological information for cancer control planning and evaluation.