Data sources
The original study cohort collated by Johns et al. [9] consisted of 2,669,328 women aged between 49 and 64 years who were ever registered on one of the NHS databases, known as the Exeter databases, covering 61 local health authorities in England and Wales between 1st January 1988 and 31st December 1994. The study database contained individual level demographic data (age and sex), cancer registration data, death registration data and breast cancer screening histories. Women in the cohort were followed for cancer diagnosis and death until 31st December 2005. Concerns about the completeness of mortality data led to analysis at the time being restricted to a subset of one million women. Detailed information about the data sources have been published previously [9].
As the aim of this current study was to assess the long-term impact of mammography screening on risk of breast cancer incidence and mortality, up-to-date cancer incidence and mortality data were required. NHS number, date of birth and last known postcode (at the time of original data extract in 2008) were submitted to NHS Digital for tracing on the Personal Demographics Service (PDS) database. Cancer and mortality data were then obtained for women successfully traced up to 11th October 2020, with cancer registration obtained by NHS Digital from Public Health England, and mortality registration data from the Office for National Statistics (ONS). Women who had opted out of their data being used for research (through the National Data Opt-Out) were not returned and excluded from the study. No additional screening history data were sought.
Study design
This study made use of the staggered introduction of the NHSBSP in the UK. The first screening units began inviting women in December 1987, with the roll out continuing until the end of 1994. The earliest date of invitation in the cohort was 10th January 1988. Women aged 50–64 years were invited to attend every three years. The age range has since increased to include women up to the age of 70 years, but this study is limited to women up to and including age 64 years at the time of recruitment.
Women aged 49–64 years entered the study on 1st January 1988, with younger women entering on their 49th birthday until 31st December 1994. All women were uninvited upon entry and remained in the uninvited group until the date of their first invitation, date of breast cancer diagnosis (incidence analysis), date of death or the end of follow-up, 11th October 2020, whichever was earliest. Women only needed to be eligible to receive their first invitation to screening (i.e. aged between 49–64 years) between 1st January 1988 and 31st December 1994, but may not have actually received their first invitation during that period. Women’s first registered breast cancer diagnosis was selected and may have been invasive or ductal carcinoma in situ (DCIS). Given women’s exposure status had the potential to change over time, IBM was used, in which mortality was compared between groups based on their invitation or screening status at the time of diagnosis rather than at the time of death. This means that deaths from breast cancer in the invited group are only included in women diagnosed after their first invitation [5, 19], as screening cannot have an impact on deaths from cancers diagnosed prior to the start of screening [5]. Once in the invited group, women were followed up until date of diagnosis (incidence analysis), date of death or the end of follow-up, whichever was earliest. The same procedure was used for actual exposure to screening, in that rates of death were compared between those with tumours diagnosed before and after first screen respectively.
In the 2017 study by Johns et al. [9], due to incomplete mortality data in the early years of the study (1988–1990) two-thirds of the cohort had to be disregarded. It was hoped that the new iteration of mortality data obtained for this follow-up study would be complete. Dates of death were provided in both the demographics file extracted from the PDS database and the ONS mortality file, with the latter also containing cause of death information. Comparison of the dates of death between the two files revealed that there were in excess of 50,000 women who were listed as having died in the demographics file but did not appear in the mortality file, with the earliest death in the mortality file listed as 4th June 1991. However, over 70% of these deaths were found in the original study database collated by Johns et al. and thus dates of death in the demographics file were taken as correct and used in the final analysis. Johns et al. had undertaken bespoke matching to ascertain causes of death for the early deaths in the cohort as this information was not routinely available from the ONS mortality database. Cause of death was taken from the later mortality file, unless missing, when it was taken from the original study database where available. The dates of death which appeared only in the demographics file were evenly distributed across the study period suggesting no systemic issue with recording of mortality data on the ONS database.
Statistical analysis
Regression modelling
Relative rates assessing the effect of first invitation and first attendance to screening were calculated for both breast cancer incidence and mortality, with overdiagnosis estimated as any residual increase in incidence associated with attendance to screening observed at the end of the follow-up period. Rates were calculated by dividing the number of breast cancer cases or deaths by the number of person-years in each exposure group. Confounding by age and calendar time were adjusted for by fitting a Poisson regression model including age at entry and mean year of follow-up in the uninvited period as continuous covariates. The adjustment for time was based on the uninvited period only as not all women were invited/screened.
The absolute risk reduction was calculated using the method developed by Marmot et al. [7] in which the number needed to be invited (NNI) to prevent one death is given by:
$${NNI}=\,\frac{1}{{R}_{a}(1-{R}_{I})}$$
Where \({R}_{a}\) is the estimated mortality rate in women aged 50 years who are currently expected to die from breast cancer between the ages of 55 and 79 years in the absence of screening, and \({R}_{I}\) is the estimated mortality rate of women invited to screening compared with uninvited women.
Given a participation rate p, the number needed to screen to prevent one death is:
\({R}_{a}\) was taken to be 2.13 as calculated by Marmot et al. [7].
Self-selection bias
Self-selection bias occurs because women who choose not to attend their screening invitation tend to have a higher risk of dying from breast cancer than those who choose to attend, resulting in a bias in favour of screening [20]. This was corrected in the analysis using a modified version of the method proposed by Duffy et al. [21].
The relative rate of breast cancer incidence/mortality for compliers compared with uninvited ψ was estimated as:
$${{\rm{\psi}}} =\frac{P({Diagnosis}\,{or}\,{death}\,{from}\,{breast}\,{cancer}|{Comply}\,{with}\,{screening}\,{invitation})}{P({Diagnosis}\,{or}\,{death}\,{from}\,{breast}\,{cancer}|{Not}\,{invited})}$$
However, ideally an unbiased estimate of the effect of attendance to screening (R) was wanted, which is given by the following equation (Duffy et al. [21]):
$$R=\frac{P\,({Diagnosis}\,{or}\,{death}\,{from}\,{breast}\,{cancer}|{Comply}\,{with}\,{screening}\,{invitation})}{P\,({Diagnosis}\,{or}\,{death}\,{from}\,{breast}\,{cancer}|{Not}\,{invited}\,{but}\,{would}\,{attend}\,{if}\,{invited})}$$
With some algebra, it can be shown that:
$$R=\frac{p{{\rm{\psi }}}}{1-(1-p){D}_{r}}$$
where p is the proportion complying with an invitation to screening, \({{\rm{\psi }}}\) is the estimated relative rate of breast cancer incidence/mortality for compliers compared with the uninvited and \({D}_{r}\) is the estimated relative rate of breast cancer incidence/mortality for those who were invited but not screened compared with the uninvited (see Supplementary Material). Given the availability of an uninvited comparison group, a population-specific correction factor \({D}_{r}\) was estimated rather than using an estimate from previously published RCTs, as has often been the case in previous studies. The population specific correction factor \({D}_{r}\) was estimated from additional analysis of all-cause mortality excluding breast cancer deaths using non-IBM methodology to enable the inclusion of all deaths in the cohort, not just those in women who had a diagnosis of breast cancer (Table S1). The resulting estimate implies that any change in mortality from causes other than breast cancer in those invited but not screened cannot be due to the screening and therefore, is almost certainly due to self-selection. For the incidence analysis, the outcome of interest for the self-selection factor \({D}_{r}\) was breast cancer incidence. Two self-selection adjusted estimates were calculated, one using an attendance rate p of 70%, which is the minimum performance threshold set by the NHSBSP and very close to the average attendance rate achieved by the programme, and one using a cohort-specific estimate for p, calculated as the number of person-years accrued in women who attended screening divided by the total number of person-years in all women invited.
Analyses were conducted using Stata version 18 [22] with p-values and 95% confidence intervals (CIs) calculated using Poisson regression and a p-value of <0.05 considered to indicate statistical significance.
Ethical approval
All methods were performed in accordance with the relevant guidelines and regulations, including UK GDPR, the Data Protection Act 2018. Ethical approval has been obtained from NRES Committee London—South East—reference number 02/1/064 and the study received approval under Section 251 of the NHS Act 2006 (to access data without informed consent) through the Confidentiality Advisory Group (CAG)—reference [19]/CAG/0160.

