From:  Impact of smoking on subtypes and molecular profile of breast cancer: a systematic review

 Characteristics of studies assessing smoking in relation to tumor molecular profiles.

First author, yearCountry/Study populationStudy design & sample (molecular subset)Molecular data type (genes)Platform/AssayBioinformatics/Statistical methodsSmoking exposure definition & analysisMolecular outcomes/markersKey findings on smoking-molecular associations
Callahan et al., 2019 [26]USA; women with incident primary breast cancer aged 35–79 years in the WEB (Western New York Exposures and Breast Cancer) studyCase-only analysis within population-based case—control study; FFPE tumor tissue from 718 breast cancer cases with methylation data (≈ 225 premenopausal, 493 postmenopausal; gene-specific n varies)Tumor DNA methylation in promoter regions of 9 candidate genes: SCGB3A1, CDKN2A, FHIT, GSTP1, SFN, BRCA1, RARB, CCND2, SYK; methylation quantified as % and dichotomized as > median vs. ≤ median for each geneMicrodissected FFPE tumor DNA; bisulfite conversion (EZ DNA Methylation Kit, Zymo); targeted pyrosequencing (Qiagen pyrosequencing system) using commercial and custom primer sets; data processed with Pyro Q-CpG softwareGene-level methylation treated as binary outcome (above vs. ≤ median); unconditional logistic regression estimating ORs and 95% CIs for methylation by smoking exposures; models stratified by menopausal status and adjusted for age and ER status (plus pack-years for active smoking); cumulative SHS and pack-years dichotomized at median among exposed; period-specific exposures defined in 7 age windows; only cells with ≥ 5 subjects reportedActive smoking: detailed lifetime history by 7 age periods (< 21, 21–30, 31–40, 41–50, 51–60, 61–70, > 70 years); defined as ever smoking in each period, overall smoking status (never/former/current), and cumulative pack-years (total, dichotomized around median among exposed). SHS: among never smokers (< 100 cigarettes lifetime), residential and occupational SHS exposure in the same age windows; cumulative years of SHS summed across life and categorized (none, ≤ median, > median; e.g., ≤ 30 vs. > 30 years in postmenopausal women)Gene-specific tumor promoter methylation status (high vs. low) for SCGB3A1, CDKN2A, FHIT, GSTP1, SFN, BRCA1, RARB, CCND2, SYK; no composite molecular signatures; ER/PR/HER2 and triple-negative status used as covariates, not primary outcomesPremenopausal: active smoking before 21, 21–30, and 41–50 years associated with lower odds of SCGB3A1 hypermethylation (OR ≈ 0.25–0.30); smoking before 21 associated with higher GSTP1 methylation (OR ≈ 2.6); smoking at 31–40 associated with lower BRCA1 methylation (OR ≈ 0.09). Postmenopausal: active smoking at 41–50 strongly associated with higher FHIT methylation (OR ≈ 4.6) and at 51–60 with higher GSTP1 methylation (OR ≈ 2.3); current vs. never smokers had increased CDKN2A methylation (OR ≈ 2.1; p-trend ≈ 0.02); higher pack-years (> median) associated with increased CDKN2A methylation (OR ≈ 2.0). Among postmenopausal never-smokers, greater cumulative SHS was inversely associated with BRCA1 and SYK methylation (e.g., >30 years SHS vs. none for BRCA1 OR ≈ 0.3). No consistent associations for premenopausal SHS or for most other genes
Takada et al., 2020 [16]Japan; women with resectable primary breast cancer undergoing curative surgery at Osaka City University Hospital (2007–2018); subset with biopsy/resection of recurrent lesions and known smoking historySingle-centre retrospective cohort of 989 primary breast cancer patients; recurrences in 77, of whom 50 (with paired primary-recurrent tissue and recorded smoking history) were included for molecular/smoking analyses; all were preoperative systemic-therapy-naïveProtein expression of ER, PR, HER2 and Ki-67 in primary and recurrent tumors by immunohistochemistry; tumors classified into intrinsic subtypes: HRBC (ER and/or PR+), HER2BC (ER−/PR−/HER2+), TNBC (ER−/PR−/HER2−)Standard immunohistochemistry on surgical and recurrent biopsy/resection specimens in institutional pathology lab; Ki-67 proliferation index evaluated with a 14% cutoff; imaging (US, CT, bone scintigraphy) used for staging but not for molecular classificationConcordance/discordance in receptor status (ER, PR, HER2) between primary and recurrent tumors evaluated; chi-square tests for associations between receptor conversion and clinicopathological factors; logistic regression to estimate ORs and 95% CIs for positive HER2 conversion by smoking status and pack-year categories; Kaplan–Meier curves and log-rank tests for progression-free survival (PFS) and post-recurrence survival (PRS); Cox proportional hazards models for univariate and multivariate prognostic analysesSmoking history was recorded at the first visit (cigarettes/day and years of smoking); pack-years calculated as (cigarettes per day ÷ 20) × years; patients classified as smokers (any history) vs. non-smokers; 14/50 (28%) were smokers with median 30 pack-years (range 1.4–150); for HER2-conversion analyses, smokers were further grouped by pack-years (≤ 25, 25–50, > 50) vs. non-smokers; smoking assessed only up to surgery (no longitudinal updates)Changes in IHC status of ER, PR, and HER2 between primary and recurrent tumors; intrinsic subtype change (HRBC/HER2BC/TNBC) at recurrence; observed conversion rates: ER negative conversion 3/50 (6%), ER positive conversion 1/50 (2%); PR negative conversion 15/50 (30%); HER2 positive conversion 6/50 (12%), no HER2 negative conversion; intrinsic subtype change in 5/50 (10%)Positive HER2 conversion at recurrence was significantly more frequent in smokers (4/14; 28.6%) than in non-smokers (2/36; 5.6%) (p = 0.024); logistic regression showed smokers vs. non-smokers had higher odds of HER2 positive conversion (OR 6.8, 95% CI 1.082–42.731), with ORs increasing across higher pack-year categories (up to OR 17.0 for > 50 pack-years vs. non-smokers, albeit with wide CIs); smoking was not significantly associated with ER or PR conversion, intrinsic subtype change, or other clinicopathological variables
Wang et al., 2021 [17]TCGA pan-cancer cohort (BLCA, CESC, ESCA, HNSC, KIRP, LUAD, LUSC); 2,317 tumor patients with recorded smoking history and multi-omics dataRetrospective multi-omics analysis of TCGA level-3 data across 7 smoking-related cancers; integrated RNA-seq, miRNA, DNA methylation, SNVs, CNVs, and clinical data (OS, DSS, PFI, stage, age, sex)Multi-omics: mRNA expression (RNA-seq), miRNA expression, lncRNA expression, DNA methylation (Illumina HumanMethylation450), somatic SNVs, CNVs, immune/stromal scores, stemness indices; identification of 11 smoking-related methylation driver genes (EIF5A2, GBP6, HGD, HS6ST1, ITGA5, NR2F2, PLS1, PPP1R18, PTHLH, SLC6A15, YEATS2) and a 46-gene smoking-related prognostic signature; ceRNA network involving miRNAs (e.g., miR-193b-3p, miR-301b, miR-205-5p, miR-132-3p, miR-212-3p, miR-1271-5p, miR-137)Public TCGA pipelines: RNA-seq [log2(TPM + 1)], Illumina 450K methylation, VarScan2 SNVs, masked CNV segments; CNVs summarized with GISTIC2.0; immune and stromal contexture from ssGSEA and ESTIMATE; chemotherapeutic response predicted using GDSC IC50 modeling (ridge regression via “pRRophetic”)Survival differences by smoking history evaluated with Kaplan-Meier curves and Cox regression; multi-variable Cox models including smoking (non/former/current coded 0/1/2), age, sex, and stage; ssGSEA for 29 immune signatures; ESTIMATE for stromal/immune/estimate scores and tumor purity; BCR diversity, leukocyte fraction, neoantigens, HRD, CTA scores from published TCGA resources; stemness indices (mRNAsi, mDNAsi, DMPsi, ENHsi, EREG-mRNAsi, EREG-mDNAsi) from Malta et al.; mutation and CNV burden and landscapes analyzed with “maftools”; differential expression via edgeR; ceRNA network using miRcode, miRDB, TargetScan, miRTarBase; methylation driver genes defined by inverse correlation (R < −0.4, p < 0.05) between methylation and expression; 46-gene prognostic model built with univariate Cox + LASSO + multivariate Cox; ROC curves and C-index for model performance; nomograms with calibration for each cancer type Smoking history derived from TCGA clinical data; patients categorized as non-smokers, former smokers, current smokers; in Cox models coded 0, 1, 2, respectively; no pack-years, intensity, or duration data; all analyses stratified/comparative across these three smoking-history groups (non vs. former vs. current) across tumor typesMulti-omics endpoints comparing non-, former-, and current smokers: 29 immune signatures; ESTIMATE immune/stromal/estimate scores and tumor purity; BCR richness/Shannon, leukocyte fraction, neoantigen load, intratumor heterogeneity, HRD and CTA scores; stemness indices; TMB; SNV and CNV landscapes and burdens; differentially expressed mRNAs/lncRNAs/miRNAs and ceRNA network; 11 DNA methylation driver genes and their expression; a 46-gene smoking-related risk score; predicted IC50 to multiple targeted and cytotoxic agentsCurrent smokers had the worst OS and DSS, former smokers intermediate, non-smokers best; smoking history was an independent prognostic factor for OS and DSS (current > former > never risk); former smokers showed highest immune cell infiltration and immune/ESTIMATE scores and lowest tumor purity; smokers (current and former) had higher BCR diversity, leukocyte fraction, neoantigen load, intratumor heterogeneity, HRD and CTA scores than non-smokers; smoking was associated with higher stemness indices (mRNAsi, mDNAsi, etc.), higher TMB, and increased SNV incidence in multiple genes (e.g., TP53, TTN, MUC16, CSMD3, RYR2, LRP1B, USH2A, SYNE1, ZFHX4, FLG, XIRP2, PCLO) and higher CNV gain/loss burden at key loci (e.g., 3q26, 8q24, 9p21 CDKN2A/B), with partial reduction but not complete reversal after cessation; smokers had higher predicted IC50 (reduced sensitivity) for many targeted and cytotoxic drugs, with non-smokers generally most sensitive and former smokers intermediate; ceRNA network highlighted several miRNAs as potential mediators of tobacco-related tumor biology; 11 methylation driver genes showed inverse methylation-expression relationships and were linked to smoking status; 46-gene model risk scores were highest in current smokers, intermediate in former smokers, lowest in non-smokers
Ferreira et al., 2024 [15]Brazil; women with breast carcinoma treated in 2 public hospitals in São Paulo stateLongitudinal cohort of 208 women with breast cancer (age 25–65, all parous with ≥ 1 month breastfeeding); 80 smokers and 128 non-smokers; all had core biopsy with anatomopathology and immunohistochemistry, and were followed for 17 monthsImmunohistochemistry-based molecular subtypes (gene expression surrogates): luminal A, luminal B, luminal hybrid, HER2 overexpression, triple-negative, and “others”Standard IHC on histological sections with automated system: antigen retrieval in PTLink (Dako), incubation/development/counterstaining in AutoStainer Link; highly sensitive polymer detection and ready-to-use FLEX antibodies; molecular subtype assignment based on established IHC surrogate criteria from microarray gene-expression-defined subtypesDescriptive statistics with Kolmogorov-Smirnov test for normality; continuous variables as mean ± SD; group comparisons by ANOVA; categorical variables by chi-square; odds ratio for severe vs. non-severe cancer (smokers vs. non-smokers, “neoadjuvant chemotherapy groups”) with 95% CI; Kaplan-Meier curves for survival by smoking status, log-rank test; p < 0.05 considered significantSmoking is defined as regular use of ≥ 1 cigarette/day; 80 women were classified as smokers and 128 as non-smokers; no information on duration, intensity, or pack-years; smoking status assessed at baseline (diagnosis) and used as binary exposure (smoker vs. non-smoker) in all analysesTumor molecular subtype by IHC (luminal A, luminal B, luminal hybrid, HER2 overexpression, triple-negative); clinical stage (TNM, grouped as early 0–IIB vs. late III–IV); “severe cancer” operationalized via molecular profile and need for neoadjuvant chemotherapy; mortality during 17-month follow-upMolecular profile distribution differed by smoking: among smokers, luminal A 24.0%, luminal B 31.3%, luminal hybrid 14.4%, HER2 overexpression 7.2%, triple-negative 19.0%, others 4.1%; among non-smokers, luminal A 35.9%, luminal B 35.9%, luminal hybrid 11.7%, HER2 overexpression 6.3%, triple-negative 10.1%, others 0.1%. Smokers had significantly lower luminal A (p = 0.035) and higher triple-negative frequency (p = 0.030). Triple-negative smokers were younger (mean 48.2 years) than triple-negative non-smokers (52.6 years, p = 0.005). Risk of more severe cancer (defined by neoadjuvant chemotherapy groups/molecular severity) was 5.5-fold higher in smokers than non-smokers (OR 5.5; 95% CI 3.0–10.0). Clinical stage distribution (I–IV) did not differ significantly between smokers and non-smokers

ANOVA: analysis of variance; BCR: B-cell receptor (repertoire); BLCA: bladder urothelial carcinoma; BRCA1: breast cancer 1, early-onset; CCND2: cyclin D2; CDKN2A: cyclin-dependent kinase inhibitor 2A; ceRNA: competing endogenous RNA; CESC: cervical squamous cell carcinoma and endocervical adenocarcinoma; CI: confidence interval; CNV: copy number variation; CSMD3: CUB and Sushi multiple domains 3; CTA: cancer-testis antigen; CT: computed tomography; DMPsi: DNA methylation-based stemness index; DNA: deoxyribonucleic acid; DSS: disease-specific survival; EIF5A2: eukaryotic translation initiation factor 5A2; ENHsi: enhancer-based stemness index; ER: estrogen receptor; EREG-mDNAsi: epigenetically regulated DNA methylation-based stemness index; EREG-mRNAsi: epigenetically regulated mRNA-based stemness index; ESCA: esophageal carcinoma; ESTIMATE: Estimation of STromal and Immune cells in MAlignant Tumours using Expression data; FFPE: formalin-fixed paraffin-embedded; FHIT: fragile histidine triad; FLG: filaggrin; GBP6: guanylate binding protein family member 6; GDSC: Genomics of Drug Sensitivity in Cancer; GSTP1: glutathione S-transferase Pi 1; HGD: homogentisate 1,2-dioxygenase; HER2: human epidermal growth factor receptor 2; HER2BC: HER2-positive breast cancer; HNSC: head and neck squamous cell carcinoma; HR: hazard ratio; HRBC: hormone receptor-positive breast cancer; HRD: homologous recombination deficiency; HS6ST1: heparan sulfate 6-O-sulfotransferase 1; IC50: half maximal inhibitory concentration; IHC: immunohistochemistry; ITGA5: integrin subunit alpha 5; Ki-67: Ki-67 proliferation index; KIRP: kidney renal papillary cell carcinoma; LASSO: least absolute shrinkage and selection operator; lncRNA: long non-coding RNA; LRP1B: low-density lipoprotein receptor-related protein 1B; LUAD: lung adenocarcinoma; LUSC: lung squamous cell carcinoma; mDNAsi: DNA methylation-based stemness index; miR: microRNA (prefix in miR IDs); miRNA: microRNA; mRNA: messenger RNA; mRNAsi: mRNA expression-based stemness index; MUC16: mucin 16, cell surface associated; NR2F2: nuclear receptor subfamily 2 group F member 2; OR: odds ratio; OS: overall survival; PCLO: piccolo presynaptic cytomatrix protein; PFI: progression-free interval; PFS: progression-free survival; PLS1: plastin 1; PPP1R18: Pprotein phosphatase 1 regulatory subunit 18; PR: progesterone receptor; PRS: post-recurrence survival; PTHLH: parathyroid hormone like hormone; RNA-seq: RNA sequencing; ROC: Receiver operating characteristic; RYR2: Ryanodine receptor 2; SCGB3A1: Secretoglobin family 3A member 1; SD: Standard deviation; SFN: Stratifin (14-3-3 sigma); SHS: second-hand smoke; SLC6A15: solute carrier family 6 member 15; SNV: single-nucleotide variant; ssGSEA: single-sample gene set enrichment analysis; SYK: spleen tyrosine kinase; SYNE1: spectrin repeat containing nuclear envelope protein 1; TCGA: The Cancer Genome Atlas; TMB: tumor mutational burden; TNBC: triple-negative breast cancer; TNM: tumor-node-metastasis staging system; TP53: tumor protein p53; TPM: transcripts per million; TTN: titin; US: ultrasonography; USH2A: usherin; WEB: Western New York Exposures and Breast Cancer study; XIRP2: xin actin-binding repeat-containing protein 2; YEATS2: YEATS domain containing 2; ZFHX4: zinc finger homeobox 4.