Amplicon Sequencing Analysis Pipeline (ASAP)

Description:

TGen and Northern Arizona University have developed a high-throughput bioinformatics tool for interpreting the significance of alignment data from amplicon sequencing. Amplicon sequencing allows for rapid, cost-effective, highly multiplexed, and accurate detection of numerous clinically important targets directly in clinical samples. The Amplicon Sequencing Analysis Pipeline (ASAP) is a highly customizable pipeline for analyzing amplicon sequencing data. While primarily focused on detecting antibiotic resistant pathogens in clinical samples, the standardized target definition and customizable output formats allow ASAP to be used with any amplicon targets on any type of sample. The data format is designed to be flexible enough to describe most PCR-based assays and provide a means to automatically generate an interpretation of the results of the sequencing reads aligned to reference sequences.

ASAP includes a standardized format for defining multiple target assays, aligning read data and interpreting the results based on target definitions, and outputting the results in a variety of customizable formats (HTML, Excel, PDF, etc.) and levels of detail, from clinical summaries to full details including read counts and SNP positions for each target. In ASAP, amplicon sequence reads are first trimmed of adapter or readthrough sequences and then mapped to the reference sequences with an aligner of choice. Alignment files are analyzed alongside assay descriptions to determine the presence, percent identity, and breadth and depth of coverage of the reference and proportions of nucleotide polymorphisms for each amplicon. ASAP outputs a detailed analysis of each amplicon target against each of the samples. The final outputs include a top-level clinical report showing the mutations present, subpopulation detection, and drug significance for specific SNP locations, and a mid-level research report, including more details, such as the number of reads that aligned and the base and codon distributions at each of these locations.

Link to Issued US Patent 11,386,977