TY - JOUR
T1 - Exonsampler
T2 - A computer program for genome-wide and candidate gene exon sampling for targeted next-generation sequencing
AU - Cosart, Ted
AU - Beja-Pereira, Albano
AU - Luikart, Gordon
N1 - Publisher Copyright:
© 2014 John Wiley & Sons Ltd.
PY - 2014/11/1
Y1 - 2014/11/1
N2 - The computer program exonsampler automates the sampling of thousands of exon sequences from publicly available reference genome sequences and gene annotation databases. It was designed to provide exon sequences for the efficient, next-generation gene sequencing method called exon capture. The exon sequences can be sampled by a list of gene name abbreviations (e.g. IFNG, TLR1), or by sampling exons from genes spaced evenly across chromosomes. It provides a list of genomic coordinates (a bed file), as well as a set of sequences in fasta format. User-adjustable parameters for collecting exon sequences include a minimum and maximum acceptable exon length, maximum number of exonic base pairs (bp) to sample per gene, and maximum total bp for the entire collection. It allows for partial sampling of very large exons. It can preferentially sample upstream (5 prime) exons, downstream (3 prime) exons, both external exons, or all internal exons. It is written in the Python programming language using its free libraries. We describe the use of exonsampler to collect exon sequences from the domestic cow (Bos taurus) genome for the design of an exon-capture microarray to sequence exons from related species, including the zebu cow and wild bison. We collected ~10% of the exome (~3 million bp), including 155 candidate genes, and ~16 000 exons evenly spaced genomewide. We prioritized the collection of 5 prime exons to facilitate discovery and genotyping of SNPs near upstream gene regulatory DNA sequences, which control gene expression and are often under natural selection.
AB - The computer program exonsampler automates the sampling of thousands of exon sequences from publicly available reference genome sequences and gene annotation databases. It was designed to provide exon sequences for the efficient, next-generation gene sequencing method called exon capture. The exon sequences can be sampled by a list of gene name abbreviations (e.g. IFNG, TLR1), or by sampling exons from genes spaced evenly across chromosomes. It provides a list of genomic coordinates (a bed file), as well as a set of sequences in fasta format. User-adjustable parameters for collecting exon sequences include a minimum and maximum acceptable exon length, maximum number of exonic base pairs (bp) to sample per gene, and maximum total bp for the entire collection. It allows for partial sampling of very large exons. It can preferentially sample upstream (5 prime) exons, downstream (3 prime) exons, both external exons, or all internal exons. It is written in the Python programming language using its free libraries. We describe the use of exonsampler to collect exon sequences from the domestic cow (Bos taurus) genome for the design of an exon-capture microarray to sequence exons from related species, including the zebu cow and wild bison. We collected ~10% of the exome (~3 million bp), including 155 candidate genes, and ~16 000 exons evenly spaced genomewide. We prioritized the collection of 5 prime exons to facilitate discovery and genotyping of SNPs near upstream gene regulatory DNA sequences, which control gene expression and are often under natural selection.
KW - Bioinformatics
KW - Exon capture
KW - Exon sequences
KW - Next-generation sequencing
UR - http://www.scopus.com/inward/record.url?scp=84908012186&partnerID=8YFLogxK
U2 - 10.1111/1755-0998.12267
DO - 10.1111/1755-0998.12267
M3 - Article
C2 - 24751285
AN - SCOPUS:84908012186
SN - 1755-098X
VL - 14
SP - 1296
EP - 1301
JO - Molecular Ecology Resources
JF - Molecular Ecology Resources
IS - 6
ER -