DNA Methylation Profiling in Sarcoma Classification
Sarcomas are tissue and bone tumors that can be found in any part of the body and can affect people of all ages. They often grow in connective tissues or those cells whose primary role is to support other tissues in one’s body. Currently, there are more than 70 types of sarcomas. Treatment for sarcomas also varies depending on its location, type, and other factors.
Sarcomas are part of the heterogeneous class of tumors. They also lack defining histopathological features which makes diagnosing them hard. To identify them correctly, pathologists often rely on the determination of tumor-specific molecular alterations. In this study, the researchers demonstrated the classification of soft tissue and bone tumors using a machine learning classifier algorithm on array-generated DNA methylation data.
What is DNA methylation?
DNA methylation is a natural process where methyl groups are added to the DNA molecule. This changes the activity of the DNA segment without changing the sequence. When searched in a gene promoter, DNA methylation usually helps to repress gene transcription. This is the reason why they’re considered key epigenetic markers that are crucial in normal development and disease.
When a person has cancer, DNA methylation patterns reflect both the cell type origin and the acquired change during tumor formation. Profiling of human brain tumors has demonstrated entity-specific methylation signatures. This has also led to the identification of several new subtypes.
Results
DNA methylation profiling of prototypical sarcomas. The researchers used prototypical cases of the most common soft tissue and bone tumors, non-mesenchymal tumors that might mimic mesenchymal differentiation, i.e. squamous cell carcinoma or melanoma, and non-neoplastic control tissue to DNA methylation profiling using the Infinium HumanMethylation 450K.
Development of the sarcoma classifier. The researchers then developed a classification tool, a sarcoma classifier, using a Random Forest machine learning classification algorithm as described. Cross-validation, an internal performance metric15, of the sarcoma classifier provided an estimated error rate of 1.95% for raw scores and a discriminating power of 99.9% by the area under the receiver operating characteristic curve analysis. The low rate of misclassifications demonstrates the discriminating power of the classifier algorithm.
Classifier performance validated in a clinical cohort. The sarcoma classifier performance was validated on 428 additional cases which represented the relapsed and refractory tissue of bone tumors., enrolled in the MNP2.0, PTT2.0, INFORM, or NCT MASTER trials, which are focused on molecular analysis (Supplementary Data 3).
Copy number profiling of sarcomas. The researchers generated copy number variations (CNV) plots from all sarcomas of the reference cohort as described. Frequently encountered alterations include MDM2 amplification for well-/dedifferentiated liposarcomas, MYC amplification for radiation-induced angiosarcoma, or segmental chromosomal deletions on chromosome 22q encompassing SMARCB1 for rhabdoid tumors.
Discussion
The researchers created an open-access platform that allowed the categorization of sarcomas based on machine-generated methylation data and algorithm-driven analysis. Using DNA methylation-based categorization resulted in highly attractive features.
Analyses can be performed on DNA extracted from paraffin-embedded and formalin-fixed tissues allowing integration in routine settings. This represents a clear advantage over RNA expression profiling dependent on fresh tumor tissue.
The detection of patterns for sarcoma entities is important for entities lacking pathognomic gene alterations like entity gene fusions. As of now, one-third of sarcoma currently recognized by the classifier does not exhibit specific mutational events.
Heterogeneity on DNA methylation level has been described between different tumors, but also within individual tumors for Ewing sarcoma. On the other hand, that study also reported a close to 100% accuracy of distinguishing Ewing sarcoma from other cell types. Nevertheless, the observation of heterogeneity on the methylation level within individual tumors contrasts with the high stability of a parameter required for tumor classification.
The researchers describe the high stability of methylation profiles for sarcoma entities. In addition, our selection process for CpG sites included in the classification algorithm favors those with the maximal distinction between tumor entities. A practical example for the high stability of methylation profiles established by this approach has been presented for ependymoma with a demonstration of primary and recurrent tumors from the same patients neighboring in almost all instances upon unsupervised clustering.
Despite being conceptually attractive, the current version of the sarcoma classifier couldn’t assign approximately 25% of the cases in the validation cohort to a DNA methylation class. This can be explained: Foremost, in its current stage, the sarcoma classifier has not been trained to cover the entire spectrum of sarcoma subtypes. This does account for a portion of the 106/428 unrecognized cases exhibiting a calibrated score <0.9. Limited sample numbers for some entities will not allow identifying methylation subclasses as done for the chondrosarcomas splitting into four sub-categories. The future increase of the number of cases in the reference set will very likely enable the detection of more methylation subgroups. A similar tendency has been observed in pilocytic astrocytomas and medulloblastomas separating now into several methylation subgroups with the clinical impact still remaining unclear.
Moreover, the DNA methylation-based approach is dependent on fairly high tumor cell content in the samples. Our experience is best with 70% or more of all cells in a sample constituting tumor cells. Many sarcomas, however, typically contain high proportions of non-neoplastic inflammatory cells. This circumstance might have contributed to classifier output scores lower than the cut-off score of 0.9, consequently prompting the tumor evaluation as unclassifiable. The effect of tumor cell purity on the classifier performance is likely to be dependent on the sarcoma subtype.
Recommendations
Further studies with larger case numbers should be done to properly explain the effect of tumor purity on classifier performance. One way to solve this problem is to subtract methylation patterns typical for lymphocytes thereby accentuating patterns of the respective sarcoma entities.
Conclusion
The researchers introduced a tool based on DNA methylation data and on automated algorithm analysis using probability measures for sarcoma classification. The researchers developed a webpage for the scientific community listing characteristic features for the tumor methylation classes. This online platform also provides a free upload service for locally generated methylation data, which are analyzed instantly and results are returned as molecular classifier reports with a prediction confidence score. While the current version of the sarcoma classifier already includes some very rare entities, the researchers acknowledge not to cover the entire spectrum. Analysis of additional sarcoma samples, including uploaded data, subject to permission, will further improve this tool by refining established and adding novel methylation classes.