picture_as_pdf Download PDF

IARC 60th Anniversary - 19-21 May 2026

Session : 20/05/26 - Posters

The Belgian Occupational Cancer (BOCCA) dataset: Construction and insights

GOOSSENS J. ^1,5, SMEETS W. ², VAN DIJCK H. ^2,3,4, DE RAEVE H. ¹, VANDENBROECK S. ^1,5, GODDERIS L. ^1,5

¹ IDEWE, Leuven, Belgium; ² Data Science Institute, Hasselt University , Hasselt, Belgium; ³ L-Biostat, KU Leuven, Leuven, Belgium; ⁴ Centre for Health Economics Research & Modelling Infectious Diseases, University of Antwerp, Antwerp, Belgium; ⁵ Centre for Environment and Health, KU Leuven, Leuven, Belgium

Background:
A significant portion of cancer diagnoses are preventable, with potential causes rooted in lifestyle, infections, environmental, or occupational factors. The multifactorial nature of cancer presents a major challenge in its prevention and treatment. Given that a substantial part of an individual's life is spent at work, occupational exposures represent a significant risk factor. Therefore, there is a critical need for data that elucidates the relationship between occupational exposures and various types of cancer.

Objectives:
This project aims to construct the first Belgian Occupational Cancer (BOCCA) dataset by linking occupational health data from IDEWE, one of Belgium’s largest Services for Prevention and Protection at Work, with the Belgian Cancer Registry (BCR). The goal is to enable detailed exploration of associations between occupational exposures, job sectors, and cancer risk, thereby informing prevention and policy.

Methods:
The BOCCA dataset was established through retrospective linkage of IDEWE data (1992–2020) and BCR data (2004–2020) using the Social Security Identification Number. The dataset includes longitudinal occupational health records, exposure information (21 binary variables), and lifestyle-related risk factors (BMI, physical activity, smoking status). Cancer incidence was identified via BCR records. Statistical analyses employed Cox models with time-varying coefficients to assess associations, and multilevel multiple imputation was used to address missing data.

Results:
The resulting BOCCA dataset comprises over 3.7 million observations on 773 834 unique individuals. The BOCCA dataset is longitudinal, with the number of observations varying by job type and IDEWE follow-up frequency. On average, there are 4.8 observations per case, ranging from 1 to 28. Occupational sectors represented include healthcare (25.5%), manufacturing (12.2%), distributive trade (11.9%), education (11.3%), government (9.0%), services (10.6%), construction (5.8%), transport (5.6%), other (4.0%), unknown (2.3%), and food (2.0%). A total of 21,314 cancer cases were identified, with the most common breast (n=5,980), prostate (n=2,118), malignant melanoma (n=2,060), lung (n=1,341), and colon cancer (n=1,060). Since the BOCCA dataset is the first of its kind, many insights were gained into how the quality of the BOCCA dataset can be improved for further, more targeted research.

Conclusions/Implications:
The BOCCA dataset provides an explorative resource for investigating the relationships between occupational exposures and cancer incidence in Belgium's working population. This study highlights critical data collection requirements, analytical approaches, and relevant external factors for further analysis. However, quantifying the association between specific risk factors, professions, and cancers remains a challenge due to the influence of many external factors.

Funding: This work was funded by the Belgian organization ‘Kom op tegen Kanker’ under the project: ‘Belgian Occupational Cancer (BOCCA) database to identify and prevent work-related cancers.’