picture_as_pdf Download PDF

IARC 60th Anniversary - 19-21 May 2026

Session : 21/05/26 - Posters

Constrained Multiple Imputation for Nonspecific Histological Type in Cancer Registries: A Practical Guide

NGUYEN T. 1

1 Hitotsubashi University, Tokyo, Japan

Background
Histological subtype is a key variable in cancer registry research, yet a substantial proportion of cases are coded using nonspecific morphology categories that obscure underlying subtype information. These nonspecific codes impose deterministic diagnostic constraints, creating structural zeros in the outcome space. Standard categorical multiple imputation methods do not account for such constraints and may generate logically invalid imputations, compromising data integrity and downstream inference.
Objectives
The aim of this paper is to formalize the constrained imputation algorithm and provide practical guidance for its implementation in applied research. Rather than presenting new epidemiological findings, we focus on describing the methodological challenge posed by nonspecific histology, outlining the proposed solution, and demonstrating its use through step-by-step R implementation and diagnostic visualization. Although motivated by cancer registry data, the approach is broadly applicable to any setting involving categorical variables with structural constraints.
Methods
We propose a constrained multiple imputation (CMI) approach for categorical variables with structural constraints. The method modifies the sampling step of multinomial imputation within the multiple imputation by chained equations (MICE) framework by masking incompatible categories and renormalizing posterior probabilities prior to sampling. This guarantees that all imputed values respect predefined diagnostic rules while retaining the flexibility and efficiency of standard MI. We provide a formal description of the algorithm and detailed, step-by-step implementation guidance in R.
Results
The constrained imputation approach integrates seamlessly into existing MICE workflows and is fully compatible with Rubin’s rules for inference. In an illustrative application using population-based cancer registry data, the method produced logically valid imputations with stable convergence and plausible subtype distributions, while eliminating invalid category assignments observed under unconstrained imputation.
Conclusions/Implications for practice or policy
Constrained multiple imputation offers a principled and practical solution for handling nonspecific histological coding and other categorical variables with structural zeros. The approach improves data validity without sacrificing statistical efficiency and is broadly applicable beyond cancer registry research. The accompanying implementation guidance facilitates transparent and reproducible use in applied studies.