====== Bioinformatics Tools for Analyzing High Throughput Sequencing (NGS) Data====== * Commonly used tools in bioinformatics analyses of NGS dataset. Although, a lot of these tools can be used for other purposes too (such as microarray data, Sanger sequencing, proteomics). * This is a (non-exhaustive) list, but represents some of the most up-to-date and most commonly used tools. * Most of these tools are open source and free. * Most of these tools are manipulated from the command line, although some of them also provide a GUI (**G**raphical **U**ser **I**nterface). ==== Keeping up to data with sequencing platforms and cost ==== * [[http://www.molecularecologist.com/next-gen-fieldguide-2014/|Next Generation Field Guide by Travis Glenn]] -Originally published in 2011 in [[http://onlinelibrary.wiley.com/doi/10.1111/j.1755-0998.2011.03024.x/abstract;jsessionid=A90E6AD25AB3AD7E1D9FA6D5729C685B.f03t03|Molecular Ecology Resources]], but kept up to date every year. Thorough review of current sequencers, costs, pros and cons. ==== Data manipulation ==== * [[http://broadinstitute.github.io/picard/|Picard Tools]] A set of Java command line tools for manipulating high-throughput sequencing data (HTS) data and formats. * [[http://www.bioinformatics.babraham.ac.uk/projects/fastqc/|FASTQC]] A quality control tool for high throughput sequence data. ==== Sequence alignment ==== * [[https://www.broadinstitute.org/gatk/|The Genome Analysis Toolkit (GATK)]] -Software package developed at the Broad Institute to analyze high-throughput sequencing data. * [[http://bio-bwa.sourceforge.net|bwa]] -Mapping sequences (e.g 454, Illumina) against a large reference genome, such as the human genome. * [[http://bowtie-bio.sourceforge.net/index.shtml|bowtie]] -Ultrafast, memory-efficient short read aligner. ==== de novo transcriptome assembly ==== * [[http://trinityrnaseq.sourceforge.net/|trinity]] -Efficient and robust de novo reconstruction of transcriptomes from RNA-seq data * [[http://www.bcgsc.ca/platform/bioinfo/software/trans-abyss|Trans-ABySS]] -de novo assembly of RNA-Seq data using ABySS * [[http://soap.genomics.org.cn/SOAPdenovo-Trans.html|SOAPdenovo-Trans]] -de novo transcriptome assembler basing on the SOAPdenovo framework, adapt to alternative splicing and different expression level among transcripts * [[http://sourceforge.net/projects/mira-assembler/|MIRA]] -dSequence assembler and sequence mapping for whole genome shotgun and EST / RNASeq sequencing data. ==== de novo genome assembly ==== * [[http://sourceforge.net/projects/soapdenovo2/|soapdenovo2]] Short-read assembly method that can build a de novo draft assembly for human-sized genomes. * [[https://www.ebi.ac.uk/~zerbino/velvet/Velvet]] -A sequence assembler for very short reads. * [[http://www.bcgsc.ca/platform/bioinfo/software/abyss|abyss]] -**A**ssembly **By** **S**hort **S**equences - a de novo, parallel, paired-end sequence assembler * [[http://sourceforge.net/projects/mira-assembler/|MIRA]] -dSequence assembler and sequence mapping for whole genome shotgun and EST / RNASeq sequencing data. * [[http://i.cs.hku.hk/~alse/hkubrg/projects/idba_ud/|IDBA-UD] for plastid reads, dealing quite well with uneven coverage * [[http://bioinf.spbau.ru/spades|SPADES]] for plastid reads, dealing quite well with uneven coverage ==== Variant calling (SNPs / short indels) ==== * [[http://www.htslib.org/|samtools]] -Samtools is a suite of programs for interacting with high-throughput sequencing data. * [[https://www.broadinstitute.org/gatk/|The Genome Analysis Toolkit (GATK)]] There are a variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. ==== Bioinformatics tools geared specifically towards GBS and RAD data ==== * [[http://sourceforge.net/projects/tassel/|tassel]] -TASSEL is a bioinformatics software package that can analyze diversity for sequences, SNPs, or SSRs. * [[http://creskolab.uoregon.edu/stacks/|stacks]] -Software pipeline for building loci from short-read sequences, such as those generated on the Illumina platform. Stacks was developed to work with restriction enzyme-based data, such as RAD-seq, for the purpose of building genetic maps and conducting population genomics and phylogeography. * [[https://github.com/dereneaton/pyrad/releases|pyRAD]] -pyRAD can analyze RAD, ddRAD, GBS, paired-end ddRAD and paired-end GBS data sets. ==== Bioinformatics tools geared specifically towards gene expression (RNAseq) analyses ==== * [[http://ccb.jhu.edu/software/tophat/index.shtml|tophat]] -Aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons. Usefull for analysing splice variants and their expression from NGS datasets * There are also several R packages listed [[http://qcbs.ca/wiki/resources_for_r|here]] are specifically geared towards gene expression analyses ==== All-in-one proprietary software ==== * [[http://www.geneious.com/|geneious]] -Comprehensive bioinformatics software platform. * [[http://www.clcbio.com/products/clc-genomics-workbench/|CLC Genomics Workbench]] -CLC Genomics Workbench, for analyzing and visualizing next generation sequencing data. ==== Gene Ontology (GO) analyses ==== * [[https://www.blast2go.com/b2ghome|Blast2GO]] -Functional annotation of (novel) sequences and the analysis of annotation data. Also has a GUI. * [[http://erminej.chibi.ubc.ca/|ErmineJ]] -Analyses of gene sets in high-throughput genomics data such as gene expression profiling studies. Also has a GUI. * [[http://david.abcc.ncifcrf.gov/|DAVID]] -Comprehensive set of functional annotation tools for investigators to understand biological meaning behind large list of genes. ==== Microbial diversity / ecology ==== * See also [[http://qcbs.ca/wiki/guide_to_amplicon_sequencing_experiment|this]] page on the wiki. * [[http://www.mothur.org/wiki/Main_Page|mothur]] -A comprehensive bioinformatics software platform for microbial ecology (eg. 16S rRNA gene sequences diversity) * [[http://qiime.org/|Quantitative Insights Into Microbial Ecology (Qiime)]] -Another comprehensive bioinformatics software platform for microbial ecology primarily based on high-throughput amplicon sequencing data (such as SSU rRNA). Also has a GUI. === Others === * [[http://weizhong-lab.ucsd.edu/cd-hit/|cd-hit]] Clustering and comparing protein or nucleotide sequences