Carcinogenesis Advance Access originally published online on February 22, 2008
Carcinogenesis 2008 29(4):772-778; doi:10.1093/carcin/bgn053
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comparison of induced and cancer-associated mutational spectra using multivariate data analysis
Cancer Informatics Group
1 Developmental Medicine Group, Institute of Life Science, Swansea Medical School, Swansea University, Singleton Park, Swansea SA2 8PP, UK
2 Molecular Epidemiology Unit, Leeds Institute for Genetics, Health and Therapeutics, The Light Laboratories, University of Leeds, Leeds LS2 9JT, UK
3 Pathology and Tumour Biology, Leeds Institute of Molecular Medicine, Wellcome Trust Brenner Building, St James's University Hospital, Leeds LS9 7TF, UK
* To whom correspondence should be addressed. Tel: +44 1792 295222; Fax: +44 1792 13430;Email: p.d.lewis{at}swansea.ac.uk
| Abstract |
|---|
|
|
|---|
One of the most useful tools for investigating the aetiopathology of cancer is the mutation spectrum, which comprises the type and distribution of mutations within a gene sequence. Many studies have generated mutagen-induced spectra using in vitro or in vivo model systems in an attempt to find correlations with those observed in cancer-associated genes such as the TP53 tumour suppressor gene. Consequently, meaningful similarities in the types of mutation found in induced and human spectra have been demonstrated. However, it is more difficult to draw such conclusions about the distribution or sequence context of mutations when they arise in different target sequences. We have developed an analytical approach for base substitution spectra that capture information for both sequence context and mutation type simultaneously. The resulting mutation signature is a fixed set of data points that allows comparison of multiple mutation spectra regardless of sequence. We have applied this method to a mixed set of mutation spectra observed in exons 5, 7 and 8 of TP53 from cancers of brain, breast, skin, colon, oesophagus, liver, head and neck, stomach and lung (smokers and non-smokers) and spectra induced by benzo[a]pyrene diol epoxide, ultraviolet (UV) B, UVC, simulated sunlight and hydroxyl radicals in the cII, supF and yeast p53 model systems. We demonstrate that this approach allows human cancer and mutagen-induced signatures to be grouped together according to similarity. Specifically, the analysis reveals key differences between smoking- and non-smoking-related lung cancer for TP53 mutations and the mutability of CpG sites between exons in skin cancer.
Abbreviations: BPDE, benzo[a]pyrene diol epoxide; CPD, cyclobutane pyrimidine dimmer; FA, fuzzy cluster analysis; PCA, principal components analysis; rdm, relative dinucleotide mutabilities; SC, silhouette coefficient; SCC, squamous cell carcinoma; UV, ultraviolet
| Introduction |
|---|
|
|
|---|
A common approach to investigate the aetiopathology of cancer is to induce mutations in target genes using in vitro or in vivo model systems and compare the resulting mutation spectra with mutations observed in tumours. A mutation spectrum may be defined as a data set describing the distribution or type of mutations observed for a given DNA sequence. When comparing a mutagen-induced spectrum and a tumour spectrum, a major drawback is that, due to the sequences of the two genes usually being completely different, only the types of mutations may be directly compared. For base substitutions, there are only six types of mutation, two transitions (G:C
A:T and A:T
G:C) and four transversions (G:C
T:A, G:C
C:G, A:T
T:A and A:T
C:G), for which the frequencies may be compared. However, many mutagens may induce a similar mutation type spectrum but display a different context with respect to the sequence being targeted and ultimately the distribution of mutations. Thus, there are obvious limitations when using mutation type data alone to draw inferences regarding the cause of mutations observed in a disease gene. Despite the obvious difficulty in comparing mutation distribution between two non-homologous genes, correlations between mutagen-induced and disease-related mutations may be obtained at a finer level. Along a gene sequence, the likelihood of a mutation occurring at a specific site is sequence dependent. Local sequence context influences the probability of mutations occurring by modulating accessibility and binding of the mutagen, polymerase fidelity and efficiency of repair (1,2)
Cooper et al. (3), using mutations associated with human disease, demonstrated that neighbouring bases may have a profound effect on the likelihood of mutation. Numerous mutagenesis studies have demonstrated the effects of local sequence context on mutation rate for a variety of mutagens including benzo[a]pyrene diol epoxide (BPDE), methylnitrosourea and 2-amino-1-methyl-6-phenylimidazo[4,5-b]pyridine (1,4,5). Denissenko et al. (6) have observed a correlation between the position of BPDE adduct sites and mutation hot spots, mostly at CpG dinucleotides, in the TP53 gene in lung cancer. Therefore, in addition to considering mutagen-specific mutation types, it should also be possible to describe a mutagen by the dinucleotide target sequence and use the combined information to make predictions about the aetiology of mutations observed in a disease gene.
Surprisingly, little effort has been applied to developing robust methods to correlate mutation information from model systems to mutations observed in disease genes for the purposes of investigating the aetiology of disease. The author has previously used dinucleotide mutabilities to compare spontaneous mutation spectra in the supF gene (7). This approach was developed further for predicting mutation hot spots in the TP53 gene in lung tumours by extrapolating dinucleotide mutability data from the supF gene exposed to BPDE (8). What is required is a method to compare mutagen-induced mutation spectra of interest against a panel of cancer spectra. To permit the use of data from any model system, such a method should first allow for mutagen-induced and cancer-specific spectra to be derived from different gene sequences. Second, the method should allow the spectra to be grouped by similarity in mutation signature (both sequence context and types). The association between spectra in a group would be stronger than that between spectra in different groups. Finally, a measure of the fit of each signature to each group should be quantifiable.
Multivariate data analysis methods, in particular cluster analysis and data reduction, have now become mainstream methods of analysis for high-throughput biological data such as that generated by expression microarray and proteomics (9). Such methods have also been applied to reveal underlying groupings within sets of mutation spectra for the same DNA sequence (10,11). Lewis et al. (11) successfully applied multivariate data analysis methods to determine subgroupings of ultraviolet (UV)-induced and spontaneous mutation spectra in the supF gene in human, monkey and mouse cell lines and give insight into why such groups exist.
We present here a novel strategy that allows, for the first time, comparison of mutation spectra from cancer genes and mutation assays based on both sequence context and mutation types simultaneously. Our method creates mutation signatures for spectra by capturing mutation type and sequence context information simultaneously. Multivariate statistical methods are then applied to the signatures to reveal the structure of the relationships between spectra by assigning both cancer-specific and mutagen-induced spectra to distinct groups. Technically, the process involves partitioning the signatures into discrete groups using fuzzy cluster analysis (FA) and visualizing the structure of signature similarities by principal components analysis (PCA). We have applied the method to a mixed set of TP53 mutation spectra in exons 5, 7 and 8 for brain, breast, skin, colon, oesophageal, liver, head and neck, stomach and lung (smokers and non-smokers) cancers, with BPDE, UVB, UVC, simulated sunlight and hydroxyl radical (OH) induced spectra from the cII, supF and yeast p53 model systems. We demonstrate that our approach allows cancer and mutagen-induced signatures to be grouped together according to similarity to provide insight into disease aetiology.
| Methods |
|---|
|
|
|---|
Base substitution data
Benzo[a]pyrene, UVB, UVC and simulated sunlight-induced base substitution data were obtained from the literature for the cII gene in Big Blue mouse transgenic model system (12–15) and the supF gene in XP-A fibroblast (12) or the human kidney Ad293 cell lines (B. Manshian, unpublished data). In addition, spectra were obtained for BPDE (16, P.A. Burns, unpublished data), UV/simulated sunlight (16,17) and OH radicals (P.A. Burns, unpublished data) using the yeast p53 functional mutation assay (18). p53 gene cancer-specific base substitution data were extracted from the International Agency for Research on Cancer TP53 database, version 11 (19), for p53 exons 5, 7 and 8 (only these exons had sufficient data) in brain, breast, skin, colon, oesophageal, stomach, lung, liver and head and neck tumours. Lung cancer data were split into squamous cell carcinoma (SCC) in smokers, adenocarcinoma in smokers and adenocarcinoma in non-smokers. Skin cancer data were split into basal cell carcinoma and SCC. Details for each study and specific tumour pathology are shown in Table I. In total, 12 mutagen-induced signatures were generated (four benzo[a]pyrene, seven UVB/UVC/simulated sunlight and one hydroxyl radical). Three data sets were created where each included 12 TP53 exon-specific cancer signatures (supplementary Table 1). The mutagen-induced signatures were then added to each data set. All mutation signatures were assigned an identity code as shown in Table I.
|
Data analysis using iMUSE software
We have previously developed an online software package called iMUSE (P.D. Lewis, B. Manshian, P.A. Burns, in preparation), available at http://137.44.25.44/path/imuse.aspx, for multivariate data analysis of mutation spectra. iMUSE can upload multiple base substitution spectra of different gene sequences in a single file and transform the spectra into mutation signatures, via an algorithm called Dmat, for multivariate statistical analysis. A tutorial may be found at http://137.44.25.44/path/imuse/overview.aspx. Three separate files of mutation spectra (for exons 5, 7 and 8) were uploaded to iMUSE for multivariate statistical analysis. Brief descriptions of the applied methodologies follow.
Transformation of mutation spectra—the Dmat algorithm
To compare base substitution spectra from different gene sequences, iMUSE captures both mutation type and sequence context information at the dinucleotide level. Sixteen different dinucleotides (AA, GT, CG, etc.) are possible in a DNA sequence and the frequency of each can be calculated for a given polynucleotide length. It is possible to calculate the mutability of each type of dinucleotide in a mutation spectrum (7). For each dinucleotide, the expected number of substitutions is equal to the product of dinucleotide frequency and total number of substitutions in the spectrum. One can then calculate the ratio of observed to expected mutations at each dinucleotide. However, dinucleotide types occur at different frequencies in a gene sequence and one must be careful that a dinucleotide does not appear highly mutable simply because it is over-represented in the sequence. Therefore, relative dinucleotide mutabilities (rdm) are calculated which are the mutabilities that one would observe if all dinucleotides had an equal frequency. The rdm scores are calculated as follows:
|
|
One can calculate rdm scores for each base substitution type for all possible dinucleotides (e.g. G:C
T:A at GC and A:T
T:A at AA) and these scores are converted to a percentage for each spectrum. This transformation yields a mutation signature of 42 scores allowing direct comparison of same or different DNA sequences accounting for mutation type and sequence context simultaneously.
Multivariate statistical analysis of transformed spectra
Transformed mutation signatures may be analysed using multivariate statistical methods to reveal how individual signatures group together according to similarity. The term multivariate analysis applies to all statistical methods that simultaneously analyse multiple measurements on each object under investigation (20). Objects here are the mutation signatures, whereas the multiple measurements are the mutation type rdm scores. We have described previously that PCA and cluster analysis applied to mutation spectra (11) and detailed methodologies are described by Hair et al. (20). Here, we describe an approach that utilizes transformed mutation signatures to (i) determine the number of signature groups that exist in a data set using FA and (ii) visualize the relationship structure between signatures using PCA.
Fuzzy cluster analysis
Partition clustering methods assign objects in a data set to distinct clusters. Fuzzy clustering (21) allows for some ambiguity in the data, which often occurs in practice. In FA, each signature is spread over various clusters and the degree of belonging of a signature to different clusters is quantified by means of membership coefficients, which range from 0 to 1. Membership coefficients reveal how each signature fits into each group (where the sum of membership coefficients for a signature across all clusters is 1). For instance, for three clusters, a signature may be assigned membership coefficients of 0.72 to cluster A, 0.21 to cluster B and 0.07 to cluster C meaning that the signature has a 72% membership in cluster A and can be assigned to that group with some confidence.
In addition to membership coefficients, FA produces a silhouette coefficient (SC) that may be used to assess the robustness of clusters. An SC score >0.51 suggests that a reasonable structure has been found, whereas a score >0.71 suggests a strong structure. By evaluating SC scores for different cluster numbers, the user may quickly determine the optimal number of clusters that exist in the data set. Thus, FA allows values to be assigned for how signatures fit into clusters and the overall quality of the cluster structures. Furthermore, PCA and FA may be combined by superimposing FA clusters on the scatter plots to obtain a consensus result. For this study, we selected the fuzzy clustering option and correlation distance measure in iMUSE.
Principal components analysis
PCA is a statistical method routinely used to analyse interrelationships among large numbers of variables revealing common underlying factors or components. PCA examines the correlations between the original variables and condenses the information contained within these variables into a smaller group of components with minimal loss of information. Thus, PCA could reduce a large group of individual mutation signatures into smaller groups (components) of related spectra. The association of a variable (i.e. signature) with each component depends on the correlation values (loadings) calculated by PCA and the variance shared by the component and variable is equal to the square of the correlation. The first component represents the best linear combination of variables. So, the first component may be described as the single best summary of linear relationships within the data. The second component is derived from the proportion of the variance remaining after the first component has been extracted is therefore the second best linear combination of variables and so on. There is no intercorrelation between components.
Criteria are required to help decide how many components describe most of the variation within the variables and should be retained. It was mentioned above that the square of the loading of a variable on a component is equal to the variance. Summation of the squared loadings of each variable on each component gives a value called the eigenvalue. The broken stick method has been recommended as a stopping rule in PCA (22), where components should be retained as long as observed eigenvalues are higher than corresponding random broken stick components (23). Loadings of signatures and rdm scores on individual components may be visualized together using scatter plots. These plots reveal the relationships between signatures and rdm scores.
| Results |
|---|
|
|
|---|
Data for mutation signatures generated by iMUSE may be viewed in supplementary Table 2 (http://137.44.25.44/path/imuse/p53.aspx). Interestingly, a quick survey of the signatures across exons reveals variation for mutation types occurring at different dinucleotides. Such an example is the mutability of CpG sites for G:C
A:T transitions in oesophageal and head and neck cancer. The CpG rdm scores for G:C
A:T mutations in exon 5 for oesophageal and head and neck tumours are
12% each, rising to
36% each in exon 7 and
21% in exon 8. Similar patterns were observed for other cancer types. A major objective of the study was to establish similarities between TP53 gene cancer-specific mutation signatures and mutagen-specific signatures from mutation assays. Therefore, it was decided to interpret results of the analysis individually for each TP53 exon. The first stage of the analysis involved partitioning signatures into discrete clusters using FA. In order to establish the optimal number of FA clusters, SC scores were determined for between two and six clusters (Tables II and III). For exon 5, the optimal number of clusters was three with an SC score of 0.57 confirming reasonable structure and good fit of signatures within these clusters. Similarly, SC scores (0.65 and 0.64, respectively) for exons 7 and 8 again predicted three clusters of signatures with good structure. With the number of clusters established, PCA was then applied to the data to reveal the relatedness between cancer and mutagen signatures within and between groups. The number of components (PC) to be retained from the PCA analysis for exons 5, 7 and 8 was determined by referring to the scree plots produced by iMUSE (Figure 1). The scree plot shows eigenvalues for the first four components for each of exons 5, 7 and 8. In each exon, the first component (PC1) contains a large percentage of the variation with an eigenvalue far greater than that for the broken stick component. Similarly, PC2 and PC3 have eigenvalues greater than corresponding broken stick values; however, this is not the case for PC4 and so the first three components are retained for further analysis for all exons. The number of components retained was in agreement with the number of optimal FA clusters for each exon. How signatures correlate with components can be visualized using scatter plots of the signatures loadings (Figures 2, 3 and 4). Signatures with similar loadings for pairs of components will group together in the plot—indeed, one can interpret the plots as a pulling together of signatures by similarity. Plots are shown for PC1 versus PC2 and PC1 versus PC3 for each exon. Loadings for rdm scores for mutation types on each component are also plotted.
|
|
|
|
|
|
Exon 5
FA membership scores for signatures reveal strong membership of UV and both skin cancer signatures in cluster 3 (Tables II and III). BPDE, OH, lung adenocarcinoma in smokers, lung SCC in smokers and liver cancer all group in cluster 2. Cluster 1 contains all other cancer signatures. Signatures for both liver cancer and lung SCC in smokers have a low membership score with cluster 2, although lung adenocarcinoma in smokers shares, with BPDE, high membership and indeed higher membership than the OH signature. PCA also shows evidence of three groups of signatures when assessing all three components via both scatter plots (Figure 2A and B). Signatures are encircled in Figure 2B according to the FA clusters that they fall into.
All UV signatures and both skin cancer signatures have a higher positive correlation with PC1 relative to other signatures and a positive correlation with PC2. The plots further reveal that individual rdm scores for G:C
A:T have similar positive correlations on PC2 to UV and skin cancer signatures at TC (read as G
A.ga on the plot), CT (G
A.ag) and CC (G
A.gg). The grouping of the UV-induced signatures also appears extremely tight, suggesting very good correlation between mutation assays for UV-induced mutation signatures. Due to the condensed clustering of UV and skin cancer signatures, it is difficult to draw conclusions as to which assay gene, harbouring mutations induced by either UVB, UVC or sunlight, shows the highest similarity in mutation signature to either skin cancer type.
All signatures for BPDE, OH and lung adenocarcinoma in smokers have a higher negative correlation with PC3 and form a distinct group as shown by FA cluster 2. Although the correlation on PC3 is lower for signatures of lung SCC in smokers and liver cancer, these signatures do group very loosely with BPDE and OH. All other cancer types including lung adenocarcinoma in non-smokers have a positive correlation on PC3 and have an association with G:C
A:T mutations. The close correlation between BPDE, OH and lung adenocarcinoma in smokers can be explained by a similar pattern of loadings of rdm scores on PC3 for G:C
T:A mutations at dinucleotides that include CpG, AG, GA and GT. When considering loadings on all components, the signatures for BPDE spectra in the cII gene and the p53 gene from the yeast functional assay show the most association with lung adenocarcinoma in smokers.
Exon 7
The three FA clusters of signatures for exon 7 (Tables II and III) show a different grouping pattern to that observed for exon 5. Once again, UV signatures form a distinct group with high membership scores for cluster 3. However, both skin cancer signatures group strongly with all other cancer types in cluster 1 with the exception of lung SCC in smokers and liver. The latter two cancer signatures group with BPDE and OH signatures in cluster 2 with liver having a very low membership score. PCA scatter plots show the associations between signatures in exon 7 (Figure 3A and B).
Both skin cancer signatures for exon 7 appear to differ from UV-induced signatures due to differences in proportions of G:C
A:T mutations at CpG sites. The incidence of G:C
A:T mutations at CpG sites in mutation assay genes is much lower relative to the incidence observed in skin tumours. This observation contrasts with the pattern for exon 5 and clearly highlights differential mutability of CpG sites between these two exons when considering the potential for UV damage. A clear difference also exists between exons 5 and 7 for the association between BPDE, OH and lung cancer types. A higher proportion of G:C
T:A mutations at dinucleotides including CpG means the signature of lung SCC in smokers groups more closely with that of BPDE, with OH having a slightly lesser association with this cancer type. The loading of the liver signature with PC3 means liver groups away from the BPDE signatures although by FA membership it is still classed as within the same cluster. Therefore, when considering TP53 exons 5 and 7, differences exist in similarities between signatures for smoking-associated lung cancer types and BPDE mutagens as well as between UV and skin cancers.
Exon 8
Although differences existed between exons 5 and 7 for signature patterns, the FA cluster membership for exon 8 is remarkably similar to exon 5 with only one exception. Cluster 3 contains UV-induced and skin cancer signatures with high membership scores. Cluster 2 contains signatures for BPDE, OH and smoking-associated lung adenocarcinoma and lung SCC. Cluster 1 contains all other cancer types including liver, with a very high membership score, and lung adenocarcinoma in non-smokers with a low membership score.
PCA scatter plots (Figure 4A and B) for the exon 8 data show UV-induced and skin cancer signatures to have a negative correlation with PC2 due to a higher relative proportion of G:C
A:T mutations at GA and GG sites. BPDE, OH and smoking-associated lung adenocarcinoma and SCC all have a positive correlation with PC3 and a low correlation with PC1. This pattern mirrors that for G:C
T:A mutations at CpG, GG, GT and GA sites. Closer inspection of PC2 and PC3 in Figure 4B reveals lung adenocarcinoma signature in smokers to be more similar to the BPDE signature than OH. As was observed for exon 5, the most similar BPDE signatures to lung adenocarcinoma were those for the cII gene and the p53 gene from the yeast functional assay.
| Discussion |
|---|
|
|
|---|
A key objective of this study was to evaluate a method of combining transformed mutation spectra from mutation assay and cancer genes, accounting for mutation type and sequence context, with multivariate statistical methods to create explainable grouping patterns of the resulting mutation signatures. The strategy can be assessed by considering the variation in signatures for each mutagen included in the analysis that had spectra from different gene sequences. The cII, supF and p53 functional assay genes differ extensively regarding frequencies of dinucleotides and sequence length. In data sets that included TP53 signatures from diverse cancers, UV- and BPDE-induced signatures showed very little within-mutagen variation for spectra derived from different genes. Thus, a signature obtained from rdm scores for each mutation type provides, in a fixed set of scores, the necessary information for comparison of mutation spectra from different gene sequences. The combination of FA and PCA applied to the signature data in this study allowed for visualization of the relatedness between spectra and robust grouping of signatures according to similarity.
The pattern of similarities in mutation signatures in the entire data set may be summarized overall by three groups corresponding to UV, BPDE/OH and cancer signatures. However, the non-uniform grouping patterns for signatures in the TP53 gene yields insight into potential differences in sequence context for mutations across exons in certain cancers. The almost identical grouping pattern of signatures for exons 5 and 8 (with the exception of liver cancer) reflects similar TP53 sequence context mutability for these exons. Interestingly, the grouping pattern for signatures in the exon 7 data set differed from the patterns observed in the other exons for both skin and lung cancers.
It is well documented that UV radiation causes damage, particularly C
T (or tandem C:C
T:T) transitions, at adjacent pyrimidines in DNA (24). The most common photoproducts are cyclobutane pyrimidine dimers (CPDs), which represent about three-quarters of the photoproducts, whereas the remaining non-dimer photoproducts consist mostly of 6–4 pyrimidine–pyrimidine lesions. Both types of lesion are caused by UVB and UVC. Yoon et al. (25) demonstrated that CPDs are between 20 and 40 times more frequent than other DNA photoproducts when induced with simulated sunlight. Evidence provided by You et al. (13) suggests that CPDs are responsible for the majority of UVB-induced mutations in repair-proficient mammalian cells. Furthermore, their UVB signature, included in our analysis (UBc), groups closely with all other UVB, UVC and skin cancer signatures for exons 5 and 8 explained by relatively higher levels of G:C
A:T mutations primarily at TC and CC sites and to a lesser extent at CT sites. However, the grouping patterns for exon 7 show both basal cell carcinoma and SCC signatures in skin to group away from UV signatures. This is due to a relatively higher level of G:C
A:T mutations at CpG sites. You et al. (14) argue that a considerable fraction of mutations induced by sunlight are caused by CPDs forming preferentially at dipyrimidine sequences with 5-methylcytosine. This certainly holds for exon 7 where there is a ratio of about four CC sites to one CpG highlighting a higher relative mutability for CpG sites here. In exon 5, there is a ratio of two CC sites to one CpG and this ratio is similar to that of exon 8. In exon 7, there is also a ratio of four TC sites to one CpG but TC occurs at lower frequencies than CpG in both exons 5 and 8. It is noteworthy that the two CpG sites in exon 7, at codons 245 and 248, both have a C in the 5' position of the mutable C as do those at codons 197 (exon 6) and 282 (exon 8). All four sites are mutational hot spots in skin cancer and susceptible to a 15-fold increase in CPD formation when exposed to natural sunlight (26). However, if sunlight induces a mutation frequency higher at CpG sites relative to CC and TC then this would be reflected in the simulated sunlight signature for the cII gene (SUNc). It is possible that a C in the 5' position to CpG sites enhances the mutability of CpG sites although CCG trinucleotides do exist at a number of other positions in exons 5, 7 and 8 that are not mutation hot spots. Therefore, the grouping of UV and skin spectra for mutation signatures in the TP53 gene raises more questions than answers about exon-specific dinucleotide mutability in skin cancer and the relevant mechanisms of DNA damage. What is certain from this study is that the mutability of CpG sites in TP53 exon 7 is higher than that for exons 5 and 8.
The association between BPDE and smoking-related lung tumour mutagenesis has been passionately debated over the last 10 years in the literature. Denissenko et al. (6) provided compelling evidence of correlation between BPDE adduct position and mutation hot spots (mostly at CpG dinucleotides) in the TP53 gene in lung cancer. A high proportion of TP53 mutations in lung cancers are C:G
A:T transversions, which is different to the pattern observed in all other cancers, with the exception of liver. Earlier analysis of mutation data in the International Agency for Research on Cancer TP53 database by Hainaut et al. (27) revealed that there is a significant difference for frequencies of G to T mutations between smoker- and non-smoker-specific lung cancer. Rodin et al. (28), however, had argued that because an excess of G to T mutations existed in both smoking- and non-smoking-related lung cancers then some primary cause other than BPDE adduct formation must exist that had a similar mutational signature. They further suggested that a logical candidate was oxidative DNA damage leading to further debate in the literature (27,29).
The hydroxyl radical is the most reactive of the oxyradical species and is considered to be responsible for a high proportion of cigarette smoke-mediated damage (30,31). By including BPDE, OH and all lung cancer type mutation signatures in our analysis, we were able to assess across all mutation types the potential interplay between these mutagens and lung cancers. In all exon analyses, BPDE and OH fell in the same cluster due to a higher proportion of G:C
T:A rdm scores for a number of G-containing dinucleotides particularly CpG and GG. For exons 5 and 8, the signature of lung adenocarcinoma in smokers grouped strongly with BPDE signatures particularly those of cII and the p53 yeast functional assay. The smoking-related lung SCC signature also grouped with BPDE and OH in both exons, weakly for exon 5 but with a higher degree of association for exon 8. The smoking-related lung SCC signature grouped more strongly with BPDE/OH when exon 7 was analysed although the signature for lung adenocarcinoma in smokers did not. Critically, the lung adenocarcinoma in non-smokers signature did not fall into the BPDE/OH grouping in any exon analysis.
The results of this analysis reveal that the OH mutational signature is similar to smoking-related lung cancer signatures in TP53 exons to the extent where they are hard clustered together using FA. However, it is clear and significant that the BPDE signatures are more similar to smoking-related lung cancer signatures (adenocarcinoma in exons 5 and 8 and SCC in exon 7) than those of OH. It is also interesting that liver cancer signatures displayed a higher proportion of G:C
T:A mutations relative to other cancers in non-smokers (including lung adenocarcinoma) and indeed grouped loosely with BPDE/OH for exons 5 and 7 but not 8. We suggest that when considering TP53 mutation signature differences (that include all mutation types) between smoking- and non-smoking-related lung cancers, it must be remembered that lung adenocarcinoma in non-smokers shows a lesser association with lung cancers in smokers than does liver cancer.
The application of multivariate statistical methods to mutation signatures derived from sequence context and mutation type data provides a robust method for assessing the similarities between base substitution spectra in cancer genes and mutation assays. By using the online iMUSE software package, one may rapidly screen large numbers of mutation spectra of different gene sequences. The results of our analysis demonstrate that this approach can highlight key similarities between disease- and mutagen-specific mutation patterns and sequence context. The application of this approach is not just limited to cancer genes and offers the possibility of exploring a multitude of existing disease gene spectra already published or residing in publicly available databases.
| Acknowledgments |
|---|
Conflict of Interest Statement: None declared.
| References |
|---|
|
|
|---|
- Page JE, et al. Sequence context profoundly influences the mutagenic potency of trans-opened benzo[a]pyrene 7,8-diol 9,10-epoxide-purine nucleoside adducts in site-specific mutation studies. Biochemistry (1998) 37:9127–9137.[CrossRef][Web of Science][Medline]
- Rogozin IB, et al. Theoretical analysis of mutation hotspots and their DNA sequence context specificity. Mutat. Res. (2003) 544:65–85.[CrossRef][Web of Science][Medline]
- Cooper DN, et al. The mutational spectrum of single base-pair substitutions causing human genetic disease: patterns and predictions. Hum. Genet. (1990) 85:55–74.[Web of Science][Medline]
- Burns PA, et al. Mutational specificity of MNU in the lacI gene of Escherichia coli. Carcinogenesis (1988) 9:1607–1610.
[Abstract/Free Full Text] - Shibutani S, et al. Mutagenesis of the N-(deoxyguanosin-8-yl)- 2-amino-1 methyl-6-phenylimidazo[4, 5-b]pyridine DNA adduct in mammalian cells. Sequence context effects. J. Biol. Chem. (1999) 274:27433–27438.
[Abstract/Free Full Text] - Denissenko MF, et al. Preferential formation of benzo[a]pyrene adducts at lung cancer mutational hotspots in p53. Science (1996) 274:430–432.
[Abstract/Free Full Text] - Lewis PD, et al. Spontaneous mutation spectra in supF: comparative analysis of mammalian cell line base substitution spectra. Mutagenesis (2001) 16:503–515.
[Abstract/Free Full Text] - Lewis PD, et al. In silico p53 mutation hot spots in lung cancer. Carcinogenesis (2004) 25:1–9.
[Free Full Text] - Eisen MB, et al. Cluster analysis and display of genome-wide expression patterns. Proc. Natl Acad. Sci. USA (1998) 95:14863–14868.
[Abstract/Free Full Text] - Benigni R, et al. Multivariate statistical analysis of mutational spectra of alkylating agents. Mutat. Res. (1992) 267:77–88.[Web of Science][Medline]
- Lewis PD, et al. An exploratory analysis of multiple mutation spectra. Mutat. Res. (2002) 518:163–180.[Web of Science][Medline]
- Yoon JH, et al. Methylated CpG dinucleotides are the preferential targets for G-to-T transversion mutations induced by benzo[a]pyrene diol epoxide in mammalian cells: similarities with the p53 mutation spectrum in smoking-associated lung cancers. Cancer Res. (2001) 61:110–117.[Web of Science][Medline]
- You Y-H, et al. Cyclobutane pyrimidine dimers are responsible for the vast majority of mutations induced by UVB irradiation in mammalian cells. J. Biol. Chem. (2001) 276:44688–44694.
[Abstract/Free Full Text] - You Y-H, et al. Similarities in sunlight-induced mutational spectra of CpG-methylated transgenes and the p53 gene in skin cancer point to an important role of 5-methylcytosine residues in solar UV mutagenesis. J. Mol. Biol. (2001) 305:389–399.[CrossRef][Web of Science][Medline]
- Lee D-H, et al. Deamination of 5-methylcytosines within cyclobutane pyrimidine dimers is an important component of UVB mutagenesis. J. Biol. Chem. (2003) 278:10314–10321.
[Abstract/Free Full Text] - Yoon J-H, et al. Simulated sunlight and benzo[a]pyrene diol epoxide induced mutagenesis in the human p53 gene evaluated by the yeast functional assay: lack of correspondence to tumor mutation spectra. Carcinogenesis (2003) 24:113–119.
[Abstract/Free Full Text] - Inga A, et al. Ultraviolet-light induced p53 mutational spectrum in yeast is indistinguishable from p53 mutations in human skin cancer. Carcinogenesis (1998) 19:741–746.
[Abstract/Free Full Text] - Fronza G, et al. The yeast p53 functional assay: a new tool for molecular epidemiology. Hopes and facts. Mutat. Res. (2000) 462:293–301.[CrossRef][Web of Science][Medline]
- Petitjean A, et al. Impact of mutant p53 functional properties on TP53 mutation patterns and tumor phenotype: lessons from recent developments in the IARC TP53 database. Hum. Mutat. (2007) 28:622–629.[CrossRef][Web of Science][Medline]
- Hair JF, et al. Multivariate Data Analysis (1995) New Jersey: Prentice Hall. 364–483.
- Kaufman L, et al. Finding Groups in Data: An Introduction to Cluster Analysis (1990) New York: Wiley.
- Jackson DA. Stopping rules in principal components analysis: a comparison of heuristical and statistical approaches. Ecology (1993) 74:2204–2214.[CrossRef][Web of Science]
- Pielou EC. Ecological Diversity (1975) New York: Wiley & Sons.
- Kraemer KH. Sunlight and skin cancer: another link revealed. Proc. Natl Acad. Sci. USA (1997) 94:11–14.
[Free Full Text] - Yoon J-H, et al. The DNA damage spectrum produced by simulated sunlight. J. Mol. Biol. (2000) 302:1019–1020.[CrossRef][Web of Science]
- Tommasi S, et al. Sunlight induces pyrimidine dimers preferentially at 5-methylcytosine bases. Cancer Res. (1997) 57:4727–4730.
[Abstract/Free Full Text] - Hainaut P, et al. Patterns of p53 GT transversions in lung cancers reflect the primary mutagenic signature of DNA-damage by tobacco smoke. Carcinogenesis (2001) 22:367–374.
[Abstract/Free Full Text] - Rodin SN, et al. Human lung cancer and p53: the interplay between mutagenesis and selection. Proc. Natl Acad. Sci. USA (2000) 97:12244–12249.
[Abstract/Free Full Text] - Cooper CS. Smoking, lung cancers and their TP53 mutations. Mutagenesis (2002) 17:279–280.
[Free Full Text] - Nakayama T, et al. Cigarette smoke induces DNA single-strand breaks in human cells. Nature (1985) 314:462–464.[CrossRef][Medline]
- Leanderson P, et al. Cigarette smoke-induced DNA damage in cultured human lung cells: role of hydroxyl radicals and endonuclease activation. Chem. Biol. Interact. (1992) 81:197–208.[CrossRef][Web of Science][Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
E. Liapis, K. I.E. McLuckie, P. D. Lewis, P. B. Farmer, and K. Brown Mutagenicity of tamoxifen DNA adducts in human endometrial cells and in silico prediction of p53 mutation hotspots Nucleic Acids Res., October 1, 2008; 36(18): 5933 - 5945. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||




