Bioinformatics Tools for Analyzing High Throughput Sequencing (NGS) Data
- Commonly used tools in bioinformatics analyses of NGS dataset. Although, a lot of these tools can be used for other purposes too (such as microarray data, Sanger sequencing, proteomics).
- This is a (non-exhaustive) list, but represents some of the most up-to-date and most commonly used tools.
- Most of these tools are open source and free.
- Most of these tools are manipulated from the command line, although some of them also provide a GUI (Graphical User Interface).
Keeping up to data with sequencing platforms and cost
- Next Generation Field Guide by Travis Glenn -Originally published in 2011 in Molecular Ecology Resources, but kept up to date every year. Thorough review of current sequencers, costs, pros and cons.
Data manipulation
- Picard Tools A set of Java command line tools for manipulating high-throughput sequencing data (HTS) data and formats.
- FASTQC A quality control tool for high throughput sequence data.
Sequence alignment
- The Genome Analysis Toolkit (GATK) -Software package developed at the Broad Institute to analyze high-throughput sequencing data.
- bwa -Mapping sequences (e.g 454, Illumina) against a large reference genome, such as the human genome.
- bowtie -Ultrafast, memory-efficient short read aligner.
de novo transcriptome assembly
- trinity -Efficient and robust de novo reconstruction of transcriptomes from RNA-seq data
- Trans-ABySS -de novo assembly of RNA-Seq data using ABySS
- SOAPdenovo-Trans -de novo transcriptome assembler basing on the SOAPdenovo framework, adapt to alternative splicing and different expression level among transcripts
- MIRA -dSequence assembler and sequence mapping for whole genome shotgun and EST / RNASeq sequencing data.
de novo genome assembly
- soapdenovo2 Short-read assembly method that can build a de novo draft assembly for human-sized genomes.
- https://www.ebi.ac.uk/~zerbino/velvet/Velvet -A sequence assembler for very short reads.
- abyss -Assembly By Short Sequences - a de novo, parallel, paired-end sequence assembler
- MIRA -dSequence assembler and sequence mapping for whole genome shotgun and EST / RNASeq sequencing data.
- IDBA-UD] for plastid reads, dealing quite well with uneven coverage * [[http://bioinf.spbau.ru/spades|SPADES for plastid reads, dealing quite well with uneven coverage
Variant calling (SNPs / short indels)
- samtools -Samtools is a suite of programs for interacting with high-throughput sequencing data.
- The Genome Analysis Toolkit (GATK) There are a variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance.
Bioinformatics tools geared specifically towards GBS and RAD data
- tassel -TASSEL is a bioinformatics software package that can analyze diversity for sequences, SNPs, or SSRs.
- stacks -Software pipeline for building loci from short-read sequences, such as those generated on the Illumina platform. Stacks was developed to work with restriction enzyme-based data, such as RAD-seq, for the purpose of building genetic maps and conducting population genomics and phylogeography.
- pyRAD -pyRAD can analyze RAD, ddRAD, GBS, paired-end ddRAD and paired-end GBS data sets.
Bioinformatics tools geared specifically towards gene expression (RNAseq) analyses
- tophat -Aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons. Usefull for analysing splice variants and their expression from NGS datasets
- There are also several R packages listed here are specifically geared towards gene expression analyses
All-in-one proprietary software
- geneious -Comprehensive bioinformatics software platform.
- CLC Genomics Workbench -CLC Genomics Workbench, for analyzing and visualizing next generation sequencing data.
Gene Ontology (GO) analyses
- Blast2GO -Functional annotation of (novel) sequences and the analysis of annotation data. Also has a GUI.
- ErmineJ -Analyses of gene sets in high-throughput genomics data such as gene expression profiling studies. Also has a GUI.
- DAVID -Comprehensive set of functional annotation tools for investigators to understand biological meaning behind large list of genes.
Microbial diversity / ecology
- See also this page on the wiki.
- mothur -A comprehensive bioinformatics software platform for microbial ecology (eg. 16S rRNA gene sequences diversity)
- Quantitative Insights Into Microbial Ecology (Qiime) -Another comprehensive bioinformatics software platform for microbial ecology primarily based on high-throughput amplicon sequencing data (such as SSU rRNA). Also has a GUI.
Others
- cd-hit Clustering and comparing protein or nucleotide sequences