I. Description of synthetic tumour generation methods
-
Pseudo-algorithms for stochastic simulations and synthetically sampled tumours
- Supplementary Figure 1. Comparison of single population genetic statistics and deep learning models for differentiating between positive selection and neutral evolution
- Supplementary Figure 2. Predicting the number of subclones (0, 1, 2) in 2.8 million synthetic tumours
- Pseudo-algorithms for stochastic branching process (positive selection), generative sampling process (neutral evolution), and paired synthetic data generation
- Checking simulation model specification relative to sequenced patient tumours
- Supplementary Figure 3. Evaluating validity of synthetic data generation scheme with respect to real patient data (removal of low frequency variants based on mean sequencing depth)
- Supplementary Figure 4. Evaluating validity of synthetic data generation scheme with respect to real patient data (removal of low frequency variants based on mean effective coverage)
- Supplementary Figure 5. Comparison of nearest neighbour search when using mean effective coverage versus mean sequencing depth.
II. Deep learning model performance for base evolutionary inference tasks
- Examining probability estimates for detecting selection under varying subclone characteristics
- Supplementary Figure 6. Accurately detecting positive selection and subclonality is dependent on the sequencing depth, number of subclonal mutations, and subclone frequency at time of biopsy
-
Evaluating model performance in 2.8 million simulated tumours
- Supplementary Figure 7. Comparison of single population genetic statistics and deep learning models for differentiating between positive selection and neutral evolution
- Supplementary Figure 8. Predicting the number of subclones (0, 1, 2) in 2.8 million synthetic tumours
- Supplementary Figure 9. Correlation between true subclone frequency and predicted subclone frequency using synthetic supervised learning (TumE) and a population genetics informed mixture model (MOBSTER)
- Supplementary Figure 10. Error in predicting frequency of 2 detectable subclones with synthetic supervised learning (TumE)
- Supplementary Figure 11. Relationship between frequencies of subclones in the 2 subclone setting and the mean percentage error for the highest frequency subclone (1st subclone).
- Supplementary Figure 12. Relationship between frequencies of subclones in the 2 subclone setting and the mean percentage error for the lowest frequency subclone (2nd subclone).
-
Testing generalizability of selection estimates using an orthogonal cancer evolution simulator
- Supplementary Figure 13. Evaluation of TumE evolutionary classification estimates in an orthogonally simulated dataset of 900 synthetic tumours
- Supplementary Figure 14. Subclone frequency estimates are only accurate at detectable frequency ranges
-
Impact of variable birth and death rates on estimating the number of subclones and subclone frequency
- Supplementary Figure 15. TumE performance (precision and recall) for predicting the number of subclones across 26 different birth rate and death rate combinations in 6.7 million synthetic tumours.
- Supplementary Figure 16. TumE performance (precision and recall) for predicting the frequency of a single subclone across 26 different birth rate and death rate combinations in 6.7 million synthetic tumours.
-
Robustness to variable tumour purity and incorrect purity estimates
- Supplementary Figure 17. Evaluation of the peak-finding/heuristic VAF adjustment method.
- Supplementary Figure 18. Supplementary Figure 18. False positive rate for positive selection (>= 1 subclone) at variable sequencing depths, tumour purities, and errors in purity estimates in 6000 synthetic tumours.
III. Analysis in high-quality whole-genome and whole-exome sequenced tumour biopsies
-
Application of TumE to high-quality PCAWG samples
- Supplementary Figure 19. VAF distributions with annotated TumE fits for 75 PCAWG samples with either zero or one detected subclone.
IV. Transfer learning with TumE
-
Inferring additional evolutionary parameters using an alternative simulation framework TEMULATOR
- Supplementary Figure 20. Viable fitness and emergence time parameter combinations for detectable subclones (~10 - 40% VAF) in the TEMULATOR simulation framework
- Supplementary Figure 21. Comparison of predictive performance for inferring evolutionary parameters with and without pre-trained TumE models.
- Supplementary Figure 22. Comparison of mean percentage error with and without post-hoc mutation rate correction.
- Supplementary Figure 23. Mean percentage error for inferring parameters from TEMULATOR simulations (mutation rate, subclone cellular fraction, subclone emergence time, and subclone fitness).