Skip Navigation


Carcinogenesis Advance Access originally published online on May 10, 2007
Carcinogenesis 2007 28(8):1731-1739; doi:10.1093/carcin/bgm111
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
28/8/1731    most recent
bgm111v2
bgm111v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Michiels, S.
Right arrow Articles by Benhamou, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Michiels, S.
Right arrow Articles by Benhamou, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Polymorphism discovery in 62 DNA repair genes and haplotype associations with risks for lung and head and neck cancers

Stefan Michiels1,2,3,{dagger}, Patrick Danoy4,{dagger}, Philippe Dessen5, Alex Bera4, Thomas Boulet1, Christine Bouchardy6, Mark Lathrop4, Alain Sarasin5 and Simone Benhamou2,3,5,*

1 Department of Biostatistics and Epidemiology, Institut Gustave Roussy, 39 rue Camille Desmoulins, 94805 Villejuif cedex, France
2 Institut National de la Santé et de la Recherche Médicale, U794, Evry, France
3 Université d'Evry, UMR-S794, Evry, France
4 Centre National de Génotypage, 91057 Evry, France
5 Centre National de la Recherche Scientifique, FRE2939, Institut Gustave Roussy, Université Paris-Sud, 39 rue Camille Desmoulins, 94805 Villejuif cedex, France
6 Geneva Cancer Registry, 1205 Switzerland

* To whom correspondence should be addressed. Tel: +33 1 42114139; Fax: +33 1 42115315; Email: simone.benhamou{at}igr.fr


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Supplementary material
 Funding
 References
 
DNA repair is essential for the maintenance of genetic stability. We undertook sequencing to determine common genetic variants in 70 genes involved in three major repair pathways (base excision repair, nucleotide excision repair and mismatch repair) and in DNA synthesis, and investigated their relationship to lung and head and neck (H-N) cancers. Of the 70 genes examined, 62 were successfully screened (exon coverage >20%) by sequencing exons, parts of introns and flanking regions in 32 DNA samples from healthy Caucasian individuals. The strategy used allowed the detection of almost all variants with a minor allele frequency ≥5% in the regions sequenced. During single-nucleotide polymorphism (SNP) discovery, 772 sequences were detected in introns or regions flanking the gene and 313 were found in exons (leading to 113 non-synonymous variations) during single-nucleotide polymorphism (SNP) discovery. In total, 695 variants were successfully genotyped in 151 lung cancer cases, 251 H-N cancer cases and 172 hospital controls. Score statistics were used to test differences in haplotype frequencies between cases and controls in an unconditional logistic regression model. To account for multiple testing, we associated to each P-value an estimated proportion of false discoveries. Haplotype analysis revealed potential associations (P < 0.05) between lung cancer and eight genes (MSH3, MLH3, POLK, LIG1, ERCC5, PMS1, POLG2 and RPA3) and between H-N cancer and four genes (PMS1, POLG2, POLR2B and RPA1) with false discovery proportions of 25 and 55%, respectively. The DNA synthesis pathway showed a tendency for more differential SNP allele frequencies between H-N cases and controls than expected by chance (P = 0.05). These results hint to a few potential candidates for further investigation in larger studies.

Abbreviations: BER, base excision repair; CI, confidence interval; H-N, head and neck; LD, linkage disequilibrium; MAF, minor allele frequency; MMR, mismatch repair; mtDNA, mitochondrial DNA; NER, nucleotide excision repair; OR, odds ratio; SNP, single-nucleotide polymorphism; UTR, untranslated region


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Supplementary material
 Funding
 References
 
Worldwide, ~1.2 million new cases of lung cancer and 400 000 cases of head and neck (H-N) cancer (oral cavity, larynx and pharynx) are estimated to occur each year (1). Although most of these cancers are typically caused by smoking, together with excessive alcohol consumption for H-N cancer, inherited factors [e.g. single-nucleotide polymorphisms (SNPs)] could contribute to individual susceptibility to carcinogens. Polymorphisms in DNA repair genes are good candidates for such susceptibility because of their critical role in maintaining genome integrity. Establishing the DNA sequence variants that confer a high cancer risk may allow for identification of high-risk individuals to target prevention efforts.

DNA lesions are constantly produced in our cells either from exogenous origins (such as radiation or chemical carcinogens) or from endogenous sources (such as free radicals). Four main pathways have been developed and maintained during the evolution to remove the DNA lesions according to their chemical structure [base excision repair (BER), nucleotide excision repair (NER), mismatch repair (MMR) and double-strand break repair]. In the absence of full repair of DNA lesions on genomic DNA, replication can occur leading to DNA synthesis inhibition that may induce cell apoptosis or allow switching to tolerance mechanisms involving specific DNA polymerases with mutagenic translesion synthesis activities.

Numerous studies have investigated associations between different polymorphisms in different DNA repair genes and lung and H-N cancer risks. The genes that have been most studied are ERCC2, OGG1, XRCC1 and XRCC3, and cancer risks have been evaluated by recent meta-analyses (25). An increased risk of lung cancer was suggested for ERCC2 variants in codons 312 and 751 (5) and for OGG1 variant in codon 326 (3). In contrast, there were no significant associations between cancer sites and XRCC1 variants in codon 194, 280 and 399 and XRCC3 variant in codon 241 (4,5). In general, genetic polymorphisms have been studied one at a time or in pairwise combinations for their potential implication in lung and H-N cancer risks. Because these cancers are probably to result from genetic variation in many DNA repair genes, most of which having small effects, analysis of multiple genetic variants within a gene, even multiple genes within an entire pathway, should be considered in association studies. Variants within a gene occur frequently in particular combinations, or haplotypes, due to extensive linkage disequilibrium (LD). In principle, genotyping of a subset of the variants (haplotype tags) allows to test effects of all the sites with similar haplotype distribution. Haplotypes of SNPs are also potentially important since different combinations of particular alleles in the same gene may have different effects on the protein product or on transcriptional regulation, even in the absence of amino acid changes.

The advent of new genomic tools makes it possible to characterize the full spectrum of genetic variation within genes. In October 2005, the public database dbSNP contained >10 million human SNPs, of which only ~5 million had been validated (see http://www.ncbi.nlm.nih.gov/projects/SNP/snp_summary.cgi). Results from several large sequencing efforts for variation discovery on DNA repair genes have been recently reported (69). In the USA population (68), sequence analysis was conducted in anonymized DNA samples of 90–102 individuals from the Polymorphism Discovery Resource (Coriell Institute for Medical Research, Camden, NJ). These individuals were selected to represent the major ethnic groups of the population, although the ethnic origin of specific individuals is unknown. Accordingly, potential differences in allele frequencies between ethnic groups cannot be assessed. In the Japanese population (9), searching for variants in 36 DNA repair genes was performed on DNA from 59 individuals, most of them being cancer patients. Given the assumption that SNPs resulting in protein change are more likely to contribute to disease and the importance of determining allele frequencies for candidate variants prior to selecting them for analysis in a specific ethnic group (10), SNP discovery efforts in coding regions and in diverse populations are therefore still necessary for further progress in association studies on complex diseases.

In the present study, we set out to investigate a total of 70 genes, categorized as involved in the maintenance of genetic stability, using a two-stage approach: (i) SNP discovery by sequencing 32 DNA samples from healthy French Caucasian individuals and (ii) genotyping of sequence variants in 574 subjects enrolled in a French case–control study on lung cancer and on H-N cancer for risk assessment. We have included all genes known at the time of the initiation of the study in the three main pathways BER, NER and MMR, and added also key representative genes involved in DNA replication, translesion synthesis and transcription, called hereafter DNA synthesis pathway.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Supplementary material
 Funding
 References
 
SNP discovery
To determine the pattern of genetic variation in the genes studied, we resequenced DNA coding regions including up to 50 bp of introns adjacent to exons covering the splice sites. SNP discovery was performed on a control group of 32 DNA samples from healthy Caucasian individuals (16 pools of two individuals) for all the genes included in the study except three for which SNP discovery was performed on 32 (ERCC1 and ERCC2) or 96 (RAD23A) individual DNAs. The control sample consisted of French Caucasian subjects from the Epidemiological study on the Genetics and Environment of Asthma (11). Subjects were both male and female individuals without disease history. The minimal sample size of 32 allowed us to detect SNPs with a minor allele frequency (MAF) of at least 5% with a probability of 96%.

Oligonucleotides for polymerase chain reaction and sequencing were designed using the primer3 program (http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi) in order to amplify 800–1200 bp exon-containing DNA fragments and internal sequencing primers were positioned to cover exons in both forward and reverse directions. The coverage of targeted exonic regions was calculated as the percentage of bases within the exons covered by at least one sequence in SNP discovery. Pooled genomic DNAs (20 ng) were amplified using ExTaq (Takara, Otsu, Japan), followed by purification on Sephadex P100. Polymerase chain reaction products were then used as template for sequencing reactions with BigDye Terminator Cycle Sequencing kit (PE Applied Biosystems, Foster City, CA). Purified sequencing reactions were run on ABI 3700 or 3730xl DNA analyzers. Sequence analysis, SNP detection and genotype calling were performed using the Genalys software (12) that allows for genotype calls obtained from four chromosomes.

SNP selection and genotyping
For each gene, haplotypes were determined from the genotyping data obtained on the control group of 32 healthy individuals using an Expectation-Maximization algorithm. First, 324 variants were selected to account for all estimated haplotypes with frequencies >5%. These tagSNPs represented a median genetic variation (haplotype diversity) by gene of 83% (range: 43–100%). All of them were retained for further genotyping of the subjects of the case–control study. These markers were supplemented with 488 SNPs from the 70 DNA repair genes taken from HapMap database (release 14), all having a MAF >5% and selected as follows: if any HapMap tagSNP was also genotyped in our control group, such a SNP and all other HapMap tagSNPs in LD (defined by r2 > 0.80) with it were omitted from the selection process.

Genotyping of subjects from the case–control study on lung cancer and H-N cancer (n = 574) was performed using several procedures: Illumina BeadArray genotyping platform (742 variants) as described (13), Taqman (10 variants, according to the manufacturer's instructions using 5.0 ng of purified and quantified genomic DNA) or direct sequencing (60 variants). The genotype data were subjected to various quality control procedures. Quality control on Illumina genotype data was verified by including two Centre d'Etude du Polymorphisme Humain control DNAs in duplicate in each DNA plate. These DNA sample plates were all genotyped against two Illumina genotype panels that included all markers analyzed. Genotype concordance data for Centre d'Etude du Polymorphisme Humain duplicate samples were 99.85 and 99.94% within plates and 99.86 and 99.95% between plates for both genotyping panels. We removed (i) variants for which >50% of the genotype data were missing (6 variants); (ii) variants for which the MAF was <2.5% in the controls (84 variants) or (iii) variants for which there was a significant deviation from the Hardy–Weinberg proportions in the controls [P < 0.05 as assessed by Fisher's exact test (14), 27 variants]. A total of 695 variants (688 SNPs and 7 insertion/deletion polymorphisms) were retained in the analysis at the end of these steps.

Case–control study
Details on the study set-up and on subject recruitment have been previously reported (15,16). Briefly, peripheral blood samples were available for 151 patients with lung cancer, 251 with H-N cancer (including cancers of the oral cavity, larynx and pharynx) and 172 control individuals who fulfilled the criteria described below. Cases were patients with histologically confirmed incident squamous cell carcinoma (or small cell carcinoma for lung cancer). The control group, frequency matched by age, sex and hospital, consisted of consecutive patients without previous or current malignant diseases. The main medical diagnoses in the control population were rheumatological (33%), mainly lumbago and sciatica (71%), infectious and parasitic (10%), respiratory (9%), cardiovascular (8%), digestive (6%) and traumatological diseases (6%), general symptoms (7%) or other categories (21%). All cases and controls were of Caucasian origin and regular smokers, defined as people having smoked at least five cigarettes (or cigars or pipe) per day for >5 years. They underwent an identical interview with a standard questionnaire to collect detailed information on demographic factors, medical history, lifetime tobacco and alcohol use and occupational exposures. The daily consumption of each type of tobacco smoked was expressed in g/day (1 g for cigarette, 2 g for cigar and 3 g for pipe) (17). The average number of grams of tobacco smoked per day was calculated by dividing the cumulative lifetime tobacco consumption by the overall duration of smoking. Lifetime smoking exposure was also expressed in pack-years of smoking (years smoked times the number of packs of cigarettes/day).

Statistical analysis of case–control study
Separate analyses were performed for lung cancer and H-N cancer. For each variant, allele frequencies in cases and controls were compared by Fisher's exact tests. Rather than accepting or rejecting a single hypothesis for each SNP, we estimated the proportion of false positives within a subset of hypotheses where the null hypothesis of no association is unlikely to be true (false discovery rate) (18). This statistical procedure is appropriate to adjust for multiple testing in large-scale association studies (19). We used the bootstrap estimation method from Storey et al. (20) to provide for each hypothesis test a q-value, which estimates the minimum false discovery rate that can be attained when all tests with lower or equal P-values are called significant.

For each gene, the Haplo Stats library v1.2.2 for R (21) was used to test for association between haplotypes and cancer risk. We retained the largest haplotype block in which all polymorphisms were in pairwise LD (|D'|>0.85) with each other among controls. The Haplo Stats method is based on a prospective likelihood that depends on haplotype frequencies estimated by an improved Expectation-Maximization algorithm to test the statistical association between haplotype and phenotype when linkage phase is ambiguous. It is based on score statistics and provides both a global test and haplotype-specific tests (22). The global test with multiple degrees of freedom was performed to test the hypothesis that a difference in haplotype frequencies is seen between lung cancer cases and controls or between H-N cancer cases and controls. We choose this score statistic because haplotype frequencies and subject-specific probabilities are calculated under the null hypothesis of no association, such that there is no issue in pooling cases and controls together. We retained only those genes for which the global test yielded a significant result (P < 0.05) for further interpretation of tests for single haplotype effects. Haplotype-specific odds ratios (ORs) were calculated assuming an additive model and using unconditional logistic regression, controlling for age, pack-years of smoking and daily alcohol consumption where relevant. When the haplotype analyses were also adjusted for sex, this did not alter our results. No imputation procedures were applied in the case of missing data. The most common haplotype among controls was used as the reference in the logistic regression model, and rare haplotypes (frequency <5% in the pooled case and control set) were combined.

For each pathway analyzed in this study, we performed a permutation test to determine whether there are more SNPs in that pathway whose allelic frequencies are different between cases and controls as compared with what would be expected by chance. We used the one-sided Kolmogorov–Smirnov statistic to test the null hypothesis that the observed P-values come from a distribution that is stochastically less than or equal to the uniform distribution. We randomly permuted 1000 times the labels of cases and controls and recalculated each time a Kolmogorov–Smirnov statistic; the P-value is then equal to the proportion of times this relabeling resulted in a higher Kolmogorov–Smirnov statistic.


    Results
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Supplementary material
 Funding
 References
 
SNP discovery
The 70 genes, coding for 88 transcripts, were screened for variants by direct sequencing of exonic and adjacent intronic regions and parts of the flanking sequence (see supplementary Table S1 for summarized results and Table S2 for detailed information, available at Carcinogenesis Online). All data have been cataloged in the dbSNP database.

The resequenced region spanned a total of 453 533 bp of which 406 066 bp (89.5%) were covered by >80% of SNP discovery samples. No screening could be performed for eight genes (TDG, DDB2, XPA, MSH2, PMS1, PMS2, POLR2J and RFC5). Lack of complete coverage and unsuccessful screening could be attributed to failure in designing polymerase chain reaction or sequencing primers or in optimizing reaction conditions, despite several attempts.

A total of 1085 sequence variants were detected on the 62 genes successfully screened (coverage of exons >20%). These variants included 1013 SNPs and 72 insertions/deletions. Also, three triallelic SNPs were detected in our sample population. Three hundred and seventy-eight (34.8%) of these variants were not present in dbSNP (build 124), and therefore represent novel polymorphisms. On average, 16.2 polymorphisms were identified per gene and we detected one SNP every 418 bp sequenced. Of the 313 exonic alterations, 24 and 66 were located in the 5' untranslated region (UTR) and 3'UTR, respectively, whereas 232 were found in the coding regions. In the 5'UTR, only one deletion was identified whereas eight insertions/deletions were detected in the 3'UTR. Within the coding region, 110 SNPs are synonymous and the remaining 113 non-synonymous polymorphisms result in 108 amino acid substitutions, three premature stops and two insertions/deletions of 3 bp. Interestingly, the insertions/deletions located in coding regions result in the insertion/deletion of an entire codon and therefore do not give rise to frameshift mutations. The 772 non-exonic polymorphisms include 61 insertions/deletions.

While we concentrated our resequencing effort on exon-containing regions, the proportion of SNPs detected in exons was not different in the 378 newly discovered polymorphism dataset compared with the complementary variant dataset of 707 SNPs that were in dbSNP build 124 (29 versus 29%, chi-squared test: P = 0.94). For SNPs detected in exons, the proportion of non-synonymous variants identified was similar in the respective datasets (38.9 versus 34.6%, P = 0.46). According to expectations, newly discovered SNPs were mainly rare variants, 78.4% with MAF <5% compared with 22.2% of SNPs also in dbSNP (P < 0.001). We also detected more insertions/deletions (13 versus 3%, P < 0.001) in our newly discovered polymorphism dataset.

Case–control study on lung cancer and on H-N cancer
Subjects.
Of the 151 lung cancer cases, 99 (65%) had squamous cell carcinoma and 52 (35%) had small cell carcinoma. The main characteristics of the study population are presented in Table I. The mean age was slightly higher for lung cancer cases than for H-N cancer cases and controls. About 95% of cases and controls were men. The average number of pack-years of smoking was significantly higher in cases than in controls. H-N cancer cases had greater daily consumption of alcohol than controls.


View this table:
[in this window]
[in a new window]

 
Table I. Main characteristics of subjects from the case–control study

 
Genotyping data.
Genotype data on 695 variants were incorporated into the analysis after quality control. One gene, POLA, contained no variants after filtering. The average number of variants per gene was 10 [range (0–34)]. In the control population, 73% of the 695 variants had a MAF >10%.

Association with cancer risk.
Allele frequencies in each cancer group were compared with those in controls for each of the 695 variants (supplementary Table S3 is available at Carcinogenesis Online). In the lung case–control analysis, a univariate P-value < 0.05 was found for 46 SNPs within 20 genes belonging to the four pathways (three genes in BER, four genes in NER and in MMR and nine genes in DNA replication, translesion synthesis and transcription) (Table II). The estimated proportion of false discoveries (q-value) among these 46 SNPs was 72%. Of those 46 SNPs, six (13%) were within the exonic flanking region, two (4%) were within the UTR, 35 (76%) were within the introns and three (7%) were within the exons: one synonymous SNP in the exon 3 of the RECQL4 gene, A > G Glu44Glu (rs2306386), and two non-synonymous SNPs, one in the exon 1 of the POLG2 gene, G > A Ser56Asn (rs1427463) and one in the exon 2 of the MLH3 gene, C > T Pro844Leu (rs175080) (supplementary Table S2 is available at Carcinogenesis Online). For the H-N case–control analysis, a putative difference in allele frequencies was observed between cases and controls for 29 SNPs within 13 genes (univariate P < 0.05 for three genes in BER and in MMR, one gene in NER and six genes in DNA replication, translesion synthesis and transcription) (Table III). Eighty-six percent of these SNPs were estimated to be false positives (q-value). Eight of the 29 SNPs (28%) were in the exonic flanking region, one (3%) was in the UTR, 19 (66%) were in the introns and one (3%) was in the exon; this latter SNP (rs13436, rs3182008) within the exon 26 of the LIG1 gene does not result in amino acid change (G > C, Ala814Ala) (see supplementary Table S2 for SNPs location, available at Carcinogenesis Online). Four putative SNPs, belonging to the genes LIG1, MSH3, PMS2 and MGMT, were common to both cancer sites.


View this table:
[in this window]
[in a new window]

 
Table II. Associations (P < 0.05) between SNP allele frequencies and lung cancer

 


View this table:
[in this window]
[in a new window]

 
Table III. Associations (P < 0.05) between SNPs allele frequencies and H-N cancer

 
Haplotype associations with cancer risk.
For nine genes, no haplotype analysis could be performed for the following reasons: for five genes, only one SNP was kept after filtration (DDB1, POLD4, POLR2I, POLR2J and POLR2L); for three genes, we found only one haplotype with an estimated frequency >5% (POLB, POLR2G and POLR2H) and for the RFC2 gene, the two retained SNPs were not in LD among controls (D' = 0.32).

For the remaining genes, we calculated a global score statistic to test the hypothesis that of a difference in haplotype frequencies between cancer cases and controls for the largest haplotype block within that gene while adjusting for the covariates.

For the lung case–control analysis, eight genes (MSH3, MLH3, POLK, LIG1, ERCC5, PMS1, POLG2 and RPA3) showed a global P-value < 0.05 with an associated q-value or expected proportion of false discoveries equal to 25%. The haplotypes for these eight genes are provided in Table IV together with their estimated frequencies and haplotype-specific ORs, both unadjusted and adjusted for the covariates. We found a potentially decreased risk for lung cancer associated with one haplotype of the MSH3 gene [haplotype 6, adjusted OR = 0.30, 95% confidence interval (CI) = 0.12–0.74], of the MLH3 gene (haplotype 2, OR = 0.69, 95% CI = 0.49–0.97), of the POLK gene (haplotype 3, OR = 0.36, 95% CI = 0.17–0.75) and of the LIG1 gene (haplotype 3, OR = 0.45, 95% CI = 0.25–0.82) as compared with the most frequent haplotype of the corresponding genes. For the four other genes ERCC5, PMS1, POLG2 and RPA3, it was the contrast between the pool of rare haplotypes and the most frequent haplotype that might be associated with lung cancer risk (Table IV).


View this table:
[in this window]
[in a new window]

 
Table IV. Significant variation (P < 0.05) in haplotypic frequencies between cases and controls

 
For the H-N case–control analysis, four genes (PMS1, POLG2, POLR2B and RPA1) showed a global P-value < 0.05 with an associated q-value or expected proportion of false discoveries of 55%. A significant increased risk of H-N cancer was associated with one haplotype of the RPA1 gene (haplotype 2, OR = 1.60, 95% CI = 1.02–2.47). An increase in cancer risk, although not significant, was also observed for one haplotype of the POLR2B gene (haplotype 2, OR = 1.44, 95% CI = 0.94–2.20). For the two genes PMS1 and POLG2, it was the contrast between the pool of rare haplotypes and the most frequent haplotype that might be significantly associated with the risk for H-N cancer (Table IV).

Pathway associations with cancer risk.
None of the pathways showed a significantly higher number of SNPs with differential allele frequencies between lung cancer cases and controls as compared with what would be expected by chance (BER: P = 0.67, NER: P = 0.40, MMR: P = 0.45 and DNA synthesis: P = 0.35). As for H-N cancers, the so-called DNA synthesis pathway showed a tendency of having more polymorphisms with differential frequencies between cases and controls that what would be expected by chance (P = 0.05), whereas for the other pathways the P-values were clearly non-significant (BER: P = 0.24, NER: P = 0.17 and MMR: P = 0.11).


    Discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Supplementary material
 Funding
 References
 
Many of the DNA repair genes screened in our study were previously—or are currently—analyzed for SNP discovery (69). The main sequencing efforts were conducted in an anonymous panel of individuals representative of the ethnic diversity of the USA population (6,8). The frequency of sequence variations in ethnic subpopulations could therefore not be assessed and the stratified nature of the panel can have effects on both the haplotype diversity and patterns of LD (6). Our study provides an important pattern of DNA repair genetic variation in an homogenous healthy French Caucasian population. This catalog could serve as a resource for future association studies.

As a first step of the present study, we undertook a SNP discovery project for genes involved in the maintenance of genetic stability by resequencing a panel of 32 Caucasian individuals. We expected to have detected the large majority of frequent variants in the sequenced regions given that we had a 96% chance for the detection of SNPs with a MAF ≥5% in this panel. We complemented our SNP panel with validated markers from HapMap to provide additional coverage of the genetic variation within these genes.

The use of hospital controls might be a limitation in case–control studies, especially in the case of associations between genotypes and the diseases. To circumvent this potential problem, we only included subjects with various non-malignant diseases. In addition, the MAFs observed in our control population were compared with those found in the independent sample of healthy subjects used for SNP discovery. A strong correlation was observed between the MAFs of the genotypes for 422 SNPs available in both populations (correlation coefficient R = 0.92, P < 0.0001). Also, for several polymorphisms in DNA repair genes, the variant allele frequencies observed in our hospital control population were within the range of previously published frequencies in European populations. For example, we found variant allele frequencies for the XRCC1-Arg194Trp (rs1799782) and XRCC1-Arg399Gln (rs25487) polymorphisms of 0.07 and 0.35, respectively. These figures are very consistent with those reported for the Trp allele in a large-scale study in Europe and in another study in France (0.07) (3,23), as well as for the Gln allele in Europe (0.35) (3), Italy (0.34) (24), France (0.36) (23) and Germany (0.39) (25).

We applied both univariate analyses of SNPs and haplotype analyses to determine whether sequence variations were associated with cancer. The univariate analysis yielded 46 SNPs with P < 0.05 for the lung case–control study and 29 SNPs with P < 0.05 for the H-N case–control study but came with high estimated proportions of false discoveries: q-values 72 and 86%, respectively. In the haplotype analyses, the estimated proportions of false discoveries were substantially lower: a q-value of 25% for the eight genes with a global P < 0.05 for the association with lung cancer and a q-value of 55% for the four genes with a global P < 0.05 for the association with H-N cancer. Therefore, we focus below on the potential association to haplotypes. All the genes retained by the haplotype analysis also contained single variant sites meeting the P < 0.05 criterion for potential association.

Among the four categories of genes analyzed, three pathways (BER, MMR and NER) are involved in the repair of DNA damage produced directly or indirectly from carcinogens associated with lung cancer (mainly cigarette smoking) and H-N cancer (mainly cigarette smoking and alcohol consumption). For both BER and NER, only one of the respective 10 and 14 genes was found to have a haplotype that was potentially associated with lung cancer. First, LIG1, or DNA ligase 1, is a crucial enzyme, necessary not only for BER but also during MMR, NER, DSB repair and normal DNA replication, and it has previously been reported to be associated with cancer (2628). Second, ERCC5, which encodes a 3'-endonuclease enzyme, was found to have a haplotype that was potentially associated with lung cancer. Rare mutations with ERCC5 are causative of DNA repair-deficient diseases, abnormal apoptosis and a high level of cancers (29,30). Four ERCC5 polymorphisms have so far been investigated for their potential implication in cancer risk. Most studies on lung cancer focused on the D1104H polymorphism (rs17655) and did not provide consistent results (9,3133).

MMR is a pathway that is necessary for correction of the errors made during normal DNA replication by the replicative DNA polymerases (34). It is also involved in the recognition and perhaps the repair of certain mispaired nucleotides at DNA lesions, such those produced by cigarette smoking (35,36). Among the seven major genes involved in MMR that were tested here, three (MLH3, MSH3 and PMS1) exhibited haplotypes that are potentially associated with lung cancer and PSM1 was also potentially associated with H-N cancer. The PMS1 protein interacts with MLH1 to form a complex acting as a matchmaker to signal mismatch recognition to downstream repair enzymes and acts as a key player in this mutation avoidance pathway (37). The fact that the PMS1 gene might be associated with the two cancer types, as well the putative association of two other MMR enzymes with one of the two cancer types, is compatible with the major role of this pathway in modulating the risk of cancer in humans (38). However, to our knowledge, no data on PMS1 gene polymorphisms have been published in relation to lung and H-N cancers. For the MLH3 and MSH3 genes, one study by Sakiyama et al. (9) on lung cancer suggested different allele distributions between adenocarcinoma cases and controls for the MLH3 P844L (rs175080) polymorphism and no association for the MSH3 A1036T (rs26279) polymorphism.

The last group of genes corresponds to several pathways implicating normal DNA replication, error-prone translesion synthesis and basal transcription. Among the 39 genes, two DNA polymerases [the translesion POLK DNA polymerase and POLG2, subunit 2 of the mitochondrial DNA (mtDNA) polymerase] and one of the subunits of the major single-stranded binding complex (RPA3) that is necessary for both DNA repair and replication provided suggestive evidence of association with lung cancer. From those, POLG2 was one of the two genes that showed potential haplotype association with H-N, together with RPA1 (another subunit of the single-strand binding complex replication protein A). POLG2 encodes a 55 kDa protein that interacts with the catalytic activity of the DNA polymerase gamma necessary for replication and repair of mtDNA (39) and is, therefore, involved in the maintenance of mtDNA stability and integrity. Mutations in mtDNA can give rise to several diseases including cancers (40). Replication protein A is a heterotrimeric, single-stranded DNA-binding protein. Replication protein A is conserved in all eukaryotes and is essential for DNA replication, DNA repair, recombination and for modulating the MMR activity (41). Genetic polymorphisms in these genes could therefore be highly relevant in the development of smoking-related cancers. It was only the DNA synthesis pathway that revealed a tendency to have more differential allelic frequencies between H-N cancers and controls than expected by chance in the pathway by pathway analysis (P = 0.05).

In conclusion, these results suggest that the MMR and the DNA synthesis pathways might play an important role in the studied cancers. MMR pathway contributes to tumor suppression by reducing mutations and promoting apoptosis in response to some DNA damage (34). This process is known to correct DNA errors made during normal DNA replication (42) and is able to recognize some bulky adducts mispaired with bases during the translesion synthesis. Different activities of these genes, as could result from SNP variations, may modify the level of repair and the level of error-prone translesion synthesis leading again to higher rates of mutations and therefore of cancer risks.

In most cases, the SNPs we have identified with putative different allele frequencies between case and control populations do not automatically imply changes in the corresponding protein structures or gene expression. However, it is likely that many causal variants will be non-coding (43). If our typed variants do not directly influence cancer susceptibility, second variants in LD with the first might do so. We found moderate estimated relative risks of cancer associated with a specific haplotype for some of the DNA repair genes with high values of expected false discovery rates. This might be related to the small sample size of our study, it is expected that causal variants for complex diseases might have rather small effects and that large studies are needed to detect them (44). Replication of these preliminary results in larger case–control studies will be essential.


    Supplementary material
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Supplementary material
 Funding
 References
 
Supplementary Tables S1–S3 can be found at http://carcin.oxfordjournals.org/


    Funding
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Supplementary material
 Funding
 References
 
Swiss Cancer League, Switzerland (KFS1069-09-2000); League against Cancer of Fribourg, Switzerland (FOR381.88); Cancer Research, Switzerland (AKT 617); Gustave-Roussy Institute, Villejuif, France (88D28) to Clinical Research against Cancer; Association pour la Recherche sur le Cancer, Villejuif, France (SUBV 5125 and 5215).


    Footnotes
 
{dagger} These authors contributed equally to this work. Back


    Acknowledgments
 
Conflict of Interest statement: None declared.


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Supplementary material
 Funding
 References
 

  1. World Cancer Report. Lung cancer and head and neck cancer. In: International Agency for Research on Cancer IARC Press—Stewart BW, Kleihues P, eds. (2003) Lyon. 218–232.
  2. Benhamou S, et al. ERCC2/XPD gene polymorphisms and lung cancer: a HuGE review. Am. J. Epidemiol. (2005) 161:1–14.[Abstract/Free Full Text]
  3. Hung RJ, et al. Genetic polymorphisms in the base excision repair pathway and cancer risk: a HuGE review. Am. J. Epidemiol. (2005) 162:925–942.[Abstract/Free Full Text]
  4. Han S, et al. DNA repair gene XRCC3 polymorphisms and cancer risk: a meta-analysis of 48 case-control studies. Eur. J. Hum. Genet. (2006) 14:1136–1144.[CrossRef][Web of Science][Medline]
  5. Manuguerra M, et al. XRCC3 and XPD/ERCC2 single nucleotide polymorphisms and the risk of cancer: a HuGE review. Am. J. Epidemiol. (2006) 164:297–302.[Abstract/Free Full Text]
  6. Mohrenweiser HW, et al. Identification of 127 amino acid substitutions variants in screening 37 DNA repair genes in humans. Cancer Epidemiol. Biomarkers Prev. (2002) 11:1054–1064.[Abstract/Free Full Text]
  7. Livingston RJ, et al. Pattern of sequence variation across 213 environmental response genes. Genome Res. (2004) 14:1821–1831.[Abstract/Free Full Text]
  8. Packer BR, et al. SNP500Cancer: a public resource for sequence validation, assay development, and frequency analysis for genetic variation in candidate genes. Nucleic Acids Res. (2006) 34:D617–D621.[Abstract/Free Full Text]
  9. Sakiyama T, et al. Association of amino acid substitution polymorphisms in DNA repair genes TP53, POLI, REV1 and LIG4 with lung cancer risk. Int. J. Cancer (2005) 114:730–737.[CrossRef][Web of Science][Medline]
  10. Freimuth RR, et al. Polymorphism discovery in 51 chemotherapy pathway genes. Hum. Mol. Genet. (2005) 14:3595–3603.[Abstract/Free Full Text]
  11. Dizier MH, et al. Genome screen in the French EGEA study: detection of linked regions shared or not shared by allergic rhinitis and asthma. Genes Immun. (2005) 6:95–102.[CrossRef][Web of Science][Medline]
  12. Takahashi M, et al. Automated identification of single nucleotide polymorphisms from sequencing data. J. Bioinform. Comput. Biol. (2003) 1:253–265.[CrossRef][Medline]
  13. Gunderson KL, et al. A genome-wide scalable SNP genotyping assay using microarray technology. Nat. Genet. (2005) 37:549–554.[CrossRef][Web of Science][Medline]
  14. Emigh TH. Comparison of tests for Hardy-Weinberg equilibrium. Biometrics (1980) 36:627–642.[CrossRef][Web of Science]
  15. Benhamou S, et al. Association between lung cancer and microsomal epoxide hydrolase genotypes. Cancer Res. (1998) 58:5291–5293.[Abstract/Free Full Text]
  16. Jourenkova-Mironova N, et al. High-activity microsomal epoxide hydrolase genotypes and the risk of oral, pharynx, and larynx cancers. Cancer Res. (2000) 60:534–536.[Abstract/Free Full Text]
  17. International Agency for Research on Cancer. Tobacco Smoking. Monographs on the Evaluation of the Carcinogenic Risk of Chemicals to Humans (1986) Vol. 38. Lyon: IARC Scientific Publications. 55.
  18. Benjamini Y, et al. Controlling the false discovery rate—a practical and powerful approach to multiple testing. J. R. Statist. Soc. Ser. B (1995) 57:289–300.
  19. Abecasis GR, et al. Linkage disequilibrium: ancient history drives the new genetics. Hum. Hered. (2005) 59:118–124.[CrossRef][Web of Science][Medline]
  20. Storey JD, et al. Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. J. R. Statist. Soc. Ser. B (2004) 66:187–205.[CrossRef]
  21. Lake SL, et al. Estimation and tests of haplotype-environment interaction when linkage phase is ambiguous. Hum. Hered. (2003) 55:56–65.[CrossRef][Web of Science][Medline]
  22. Schaid DJ, et al. Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am. J. Hum. Genet. (2002) 70:425–434.[CrossRef][Web of Science][Medline]
  23. Moullan N, et al. Polymorphisms in the DNA repair gene XRCC1, breast cancer risk, and response to radiotherapy. Cancer Epidemiol. Biomarkers Prev. (2003) 12:1168–1174.[Abstract/Free Full Text]
  24. Shen M, et al. Polymorphisms of the DNA repair genes XRCC1, XRCC3, XPD, interaction with environmental exposures, and bladder cancer risk in a case-control study in northern Italy. Cancer Epidemiol. Biomarkers Prev. (2003) 12:1234–1240.[Abstract/Free Full Text]
  25. Popanda O, et al. Specific combinations of DNA repair gene variants and increased risk for non-small cell lung cancer. Carcinogenesis (2004) 25:2433–2441.[Abstract/Free Full Text]
  26. Barnes DE, et al. Mutations in the DNA ligase I gene of an individual with immunodeficiencies and cellular hypersensitivity to DNA-damaging agents. Cell (1992) 69:495–503.[CrossRef][Web of Science][Medline]
  27. Bentley D, et al. DNA ligase I is required for fetal liver erythropoiesis but is not essential for mammalian cell viability. Nat. Genet. (1996) 13:489–491.[CrossRef][Web of Science][Medline]
  28. Petrini JH, et al. Normal V(D)J coding junction formation in DNA ligase I deficiency syndromes. J. Immunol. (1994) 152:176–183.[Abstract]
  29. Clement V, et al. Suppression of UV-induced apoptosis by the human DNA repair protein XPG. Cell Death Differ. (2006) 13:478–488.[CrossRef][Web of Science][Medline]
  30. Dunand-Sauthier I, et al. The spacer region of XPG mediates recruitment to nucleotide excision repair complexes and determines substrate specificity. J. Biol. Chem. (2005) 280:7030–7037.[Abstract/Free Full Text]
  31. Cui Y, et al. Polymorphism of Xeroderma Pigmentosum group G and the risk of lung cancer and squamous cell carcinomas of the oropharynx, larynx and esophagus. Int. J. Cancer (2006) 118:714–720.[CrossRef][Web of Science][Medline]
  32. Shen M, et al. Polymorphisms in the DNA nucleotide excision repair genes and lung cancer risk in Xuan Wei, China. Int. J. Cancer (2005) 116:768–773.[CrossRef][Web of Science][Medline]
  33. Jeon HS, et al. Relationship between XPG codon 1104 polymorphism and risk of primary lung cancer. Carcinogenesis (2003) 24:1677–1681.[Abstract/Free Full Text]
  34. Buermeyer AB, et al. Mammalian DNA mismatch repair. Annu. Rev. Genet. (1999) 33:533–564.[CrossRef][Web of Science][Medline]
  35. Hays JB, et al. Discrimination and versatility in mismatch repair. DNA Repair (2005) 4:1463–1474.[Medline]
  36. Wu J, et al. Mismatch repair processing of carcinogen-DNA adducts triggers apoptosis. Mol. Cell. Biol. (1999) 19:8292–8301.[Abstract/Free Full Text]
  37. Erdeniz N, et al. Novel PMS1 alleles preferentially affect the repair of primer strand loops during DNA replication. Mol. Cell. Biol. (2005) 25:9221–9231.[Abstract/Free Full Text]
  38. Kolodner RD, et al. Eukaryotic DNA mismatch repair. Curr. Opin. Genet. Dev. (1999) 9:89–96.[CrossRef][Web of Science][Medline]
  39. Lim SE, et al. The mitochondrial p55 accessory subunit of human DNA polymerase gamma enhances DNA binding, promotes processive DNA synthesis, and confers N-ethylmaleimide resistance. J. Biol. Chem. (1999) 274:38197–38203.[Abstract/Free Full Text]
  40. Chen D, et al. Age-dependent decline of DNA repair activity for oxidative lesions in rat brain mitochondria. J. Neurochem. (2002) 81:1273–1284.[CrossRef][Web of Science][Medline]
  41. Guo S, et al. Regulation of replication protein a functions in DNA mismatch repair by phosphorylation. J. Biol. Chem. (2006) 281:21607–21616.[Abstract/Free Full Text]
  42. Stojic L, et al. Mismatch repair and DNA damage signalling. DNA Repair (2004) 3:1091–1101.[CrossRef][Medline]
  43. Cordell HJ, et al. Genetic association studies. Lancet (2005) 366:1121–1131.[CrossRef][Web of Science][Medline]
  44. Ioannidis JP, et al. Genetic associations in large versus small studies: an empirical assessment. Lancet (2003) 361:567–571.[CrossRef][Web of Science][Medline]
Received February 12, 2007; revised April 5, 2007; accepted April 30, 2007.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
CarcinogenesisHome page
S. Michiels, A. Laplanche, T. Boulet, P. Dessen, B. Guillonneau, A. Mejean, F. Desgrandchamps, M. Lathrop, A. Sarasin, and S. Benhamou
Genetic polymorphisms in 85 DNA repair genes and bladder cancer risk
Carcinogenesis, May 1, 2009; 30(5): 763 - 768.
[Abstract] [Full Text] [PDF]


Home page
CarcinogenesisHome page
J. Pan, J. Lin, J. G. Izzo, Y. Liu, J. Xing, M. Huang, J. A. Ajani, and X. Wu
Genetic susceptibility to esophageal cancer: the role of the nucleotide excision repair pathway
Carcinogenesis, May 1, 2009; 30(5): 785 - 792.
[Abstract] [Full Text] [PDF]


Home page
Cancer Epidemiol. Biomarkers Prev.Home page
R. R. McWilliams, W. R. Bamlet, M. de Andrade, D. N. Rider, J. M. Cunningham, and G. M. Petersen
Nucleotide Excision Repair Pathway Polymorphisms and Pancreatic Cancer Risk: Evidence for role of MMS19L
Cancer Epidemiol. Biomarkers Prev., April 1, 2009; 18(4): 1295 - 1302.
[Abstract] [Full Text] [PDF]


Home page
CarcinogenesisHome page
J. Lin, X. Pu, W. Wang, S. Matin, N. M. Tannir, C. G. Wood, and X. Wu
Case-control analysis of nucleotide excision repair pathway and the risk of renal cell carcinoma
Carcinogenesis, November 1, 2008; 29(11): 2112 - 2119.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
28/8/1731    most recent
bgm111v2
bgm111v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Michiels, S.
Right arrow Articles by Benhamou, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Michiels, S.
Right arrow Articles by Benhamou, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?