10 Synthetic Genomics Datasets

BiomedicalTextsMITIntroduced 2024-03-08

These are 10 synthetic genomics datasets generated with NEAT v3 (based on TP53 gene of Homo Sapiens) for the use case of benchmarking somatic variant callers. To find more about our generating framework please visit synth4bench GitHub repository.

The datasets explore intrinsic NGS data parameters for the use case of observing their effect on tumor-only somatic variant calling algorithms. From the 10 datasets, there are 5 of them with different coverage (while keeping all other parameters fixed) and 5 with varying read length. The reads in all datasets are paired-end .