Bradford Condon PhD

Bioinformatics, Web & Mobile Development


Examining bowtie output

This post is part 8 of a series on bioinformatics file formats, written for the 2017 UK-KBRIN Essentials of Next Generation Sequencing Workshop at the University of Kentucky.

Bowtie output

The Bowtie 2 manual is available here.

Bowtie 2 outputs SAM format. A detailed explanation of the SAM format output is available here.

Example output

This example output is from the University of Michigan Center for Statistical Genomics

@HD	VN:1.0	SO:coordinate
@SQ	SN:1	LN:249250621	AS:NCBI37	UR:file:/data/local/ref/GATK/human_g1k_v37.fasta	M5:1b22b98cdeb4a9304cb5d48026a85128
@SQ	SN:2	LN:243199373	AS:NCBI37	UR:file:/data/local/ref/GATK/human_g1k_v37.fasta	M5:a0d9851da00400dec1098a9255ac712e
@SQ	SN:3	LN:198022430	AS:NCBI37	UR:file:/data/local/ref/GATK/human_g1k_v37.fasta	M5:fdfd811849cc2fadebc929bb925902e5
@RG	ID:UM0098:1	PL:ILLUMINA	PU:HWUSI-EAS1707-615LHAAXX-L001	LB:80	DT:2010-05-05T20:00:00-0400	SM:SD37743	CN:UMCORE
@RG	ID:UM0098:2	PL:ILLUMINA	PU:HWUSI-EAS1707-615LHAAXX-L002	LB:80	DT:2010-05-05T20:00:00-0400	SM:SD37743	CN:UMCORE
@PG	ID:bwa	VN:0.5.4
@PG	ID:GATK TableRecalibration	VN:1.0.3471	CL:Covariates=[ReadGroupCovariate, QualityScoreCovariate, CycleCovariate, DinucCovariate, TileCovariate], default_read_group=null, default_platform=null, force_read_group=null, force_platform=null, solid_recal_mode=SET_Q_ZERO, window_size_nqs=5, homopolymer_nback=7, exception_if_no_tile=false, ignore_nocall_colorspace=false, pQ=5, maxQ=40, smoothing=1

Body

The below example shows four alignments in SAM format.

1:497:R:-272+13M17D24M	113	1	497	37	37M	15	100338662	0	CGGGTCTGACCTGAGGAGAACTGTGCTCCGCCTTCAG	0;==-==9;>>>>>=>>>>>>>>>>>=>>>>>>>>>>	XT:A:U	NM:i:0	SM:i:37	AM:i:0	X0:i:1	X1:i:0	XM:i:0	XO:i:0	XG:i:0	MD:Z:37
19:20389:F:275+18M2D19M	99	1	17644	0	37M	=	17919	314	TATGACTGCTAATAATACCTACACATGTTAGAACCAT	>>>>>>>>>>>>>>>>>>>><<>>><<>>4::>>:<9	RG:Z:UM0098:1	XT:A:R	NM:i:0	SM:i:0	AM:i:0	X0:i:4	X1:i:0	XM:i:0	XO:i:0	XG:i:0	MD:Z:37
19:20389:F:275+18M2D19M	147	1	17919	0	18M2D19M	=	17644	-314	GTAGTACCAACTGTAAGTCCTTATCTTCATACTTTGT	;44999;499<8<8<<<8<<><<<<><7<;<<<>><<	XT:A:R	NM:i:2	SM:i:0	AM:i:0	X0:i:4	X1:i:0	XM:i:0	XO:i:1	XG:i:2	MD:Z:18^CA19
9:21597+10M2I25M:R:-209	83	1	21678	0	8M2I27M	=	21469	-244	CACCACATCACATATACCAAGCCTGGCTGTGTCTTCT	<;9<<5><<<<><<<>><<><>><9>><>>>9>>><>	XT:A:R	NM:i:2	SM:i:0	AM:i:0	X0:i:5	X1:i:0	XM:i:0	XO:i:1	XG:i:2	MD:Z:35

Let’s look at just the first alignment.

The below table describes each field for the first alignment.

Field Alignment 1 Value description
QNAME 1:497:R:-272+13M17D24M Query name
FLAG 113 A bitwise FLAG.
RNAME 1 Reference name
POS 497 Lefmost mapping position
MAPQ 37 Mapping quality
CIGAR 37M CIGAR string ( see below)
MRNM/RNEXT 15 Reference of the mate/next read
MPOS/PNEXT 100338662 Position of mate/next read
ISIZE/TLEN 0 Template length
SEQ CGGGTCTGACCTGAGGAGAACTGTGCTCCGCCTTCAG Sequence
QUAL 0;==-==9;»»>=»»»»»>=»»»»» Phred-based quality score
TAGS XT:A:U NM:i:0 SM:i:37 AM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:37  

FLAGs, CIGARs, and TAGS

The FLAG, CIGAR, and TAGS fields all describe metadata associated with the alignment. It’s likely that if you are interested in this information, you will search for a particular code, or perhaps will want to translate a code. To that end, there are tools available to decode these fields. For example, the broad institute has a FLAG-decoder in picard tools that can convert FLAGs into human readable properties.

The FLAG and CIGAR fields are included as seperate apendices in the workshop.

TAGs

TAGs take the form [A-Za-z][A-za-z]:[AifZH]:.*.