Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
blast_introduction [2015/04/14 12:27]
sebastien.renaut
blast_introduction [2015/04/14 12:40]
sebastien.renaut
Line 1: Line 1:
 ====Comparing genes and genomes using bioinformatics==== ====Comparing genes and genomes using bioinformatics====
  
-The BLAST general program ​[[wp>BLAST]] and its specialized derivative programs (Primer-BLAST,​ conserved domains, vector contamination,​ Align two sequences, Global Sequence Alignment Tool, WGS sequences) may be the most widely used tools in bioinformatics. The BLAST algorithm ((Altschul S, Madden T, Schaffer A, Zhang J, Zhang Z, Miller W, Lipman D (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25:​3389-3402)) is optimized for speed  and is used to search protein and DNA databases for sequence similarities. ​+The [[http://​qcbs.ca/​wiki/​blast|BLAST]] ​program ​and its specialized derivative programs (Primer-BLAST,​ conserved domains, vector contamination,​ Align two sequences, Global Sequence Alignment Tool, WGS sequences) may some of the most widely used tools in bioinformatics. The BLAST algorithm ((Altschul S, Madden T, Schaffer A, Zhang J, Zhang Z, Miller W, Lipman D (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25:​3389-3402)) is optimized for speed and is used to search protein and DNA databases for sequence similarities. ​
  
 ===Useful knowledge about BLAST=== ===Useful knowledge about BLAST===
Line 7: Line 7:
 The National Center for Biotechnology Information ([[http://​blast.ncbi.nlm.nih.gov/​Blast.cgi|NCBI]]) offers a web interface to search by BLAST within its exhaustive databases for sequences that could be similar to the user’s query sequence.  ​ The National Center for Biotechnology Information ([[http://​blast.ncbi.nlm.nih.gov/​Blast.cgi|NCBI]]) offers a web interface to search by BLAST within its exhaustive databases for sequences that could be similar to the user’s query sequence.  ​
  
-The BLAST program can also be installed on a personal computer, rather be accessed through the internet. This is necessary when one wishes to search the similarity of its query sequence to a set of sequences (the database) that are not yet included in GenBank or any other public sequence databases (DDBJ, EMBL). The BLAST programs ​to install ​on a personal computer (standalone) available ​at the [[http://​blast.ncbi.nlm.nih.gov/​Blast.cgi?​CMD=Web&​PAGE_TYPE=BlastDocs&​DOC_TYPE=Download|NCBI ​Blast]] and from [[http://​www.blaststation.com/​freestuff/​en/​benchmarkBlastMac.html|BlastStation]]. On February 2011, the current version ​was ncbi-blast-2.2.24, and the [[http://​www.ncbi.nlm.nih.gov/​blast/​Blast.cgi?​CMD=Web&​PAGE_TYPE=BlastDocs&​DOC_TYPE=Download|2.2.26 version]] was released in March 2012. A pages from NCBI is useful to [[http://​www.ncbi.nlm.nih.gov/​staff/​tao/​URLAPI/​unix_setup.html|setup of Command Line BLAST]], and a the [[http://​www.ncbi.nlm.nih.gov/​books/​NBK1762/​|NCBI help pages about Blast]] ​are resourceful+The BLAST program can also be installed on a personal computer, rather be accessed through the internet. This is necessary when one wishes to search the similarity of its query sequence to a set of sequences (the database) that are not yet included in GenBank or any other public sequence databases (DDBJ, EMBL). The standalone version of BLASTto be installed ​on a personal computer (standalone) ​is available ​on [[http://​blast.ncbi.nlm.nih.gov/​Blast.cgi?​CMD=Web&​PAGE_TYPE=BlastDocs&​DOC_TYPE=Download|NCBI]] ​websiteAs of March 2015, the current version ​is ncbi-blast-2.2.30. NCBI offers information regarding setuphelp pages and FAQs. [[http://​www.ncbi.nlm.nih.gov/​staff/​tao/​URLAPI/​unix_setup.html|setup of Command Line BLAST]][[http://​www.ncbi.nlm.nih.gov/​books/​NBK1762/​|NCBI help pages about Blast]].
  
-==Batch ​blast== +==Batch ​BLAST== 
-Whether you access the BLAST through NCBI website or from your personal computer, you may compare multiple query sequences at a time, in a Batch blast. In that case, the resulting BLAST output file will be very long. You can save the output file in various format, .txt .xml .csv for instance. ​+Whether you access the BLAST through NCBI website or from your personal computer, you may need to compare multiple query sequences at a time, in a Batch blast. In that case, resulting BLAST output file can be very long. You can save the output file in various format, .txt .xml .csv for instance. ​
  
 ==Glossary== ​ ==Glossary== ​
   *Query: The sequence a user wants to get more information about    *Query: The sequence a user wants to get more information about 
   *Database: a large set of sequences the query is compared to    *Database: a large set of sequences the query is compared to 
-  *e-value: Is a number to assess how a similarity is likely to arise by chance. It involves a model of random sequences. The lower the number the less likely the similarity occurred by chance. The [[http://​www.ncbi.nlm.nih.gov/​BLAST/​tutorial/​Altschul-1.html|NCBI Blast tutorial]] explains: “Expect value. The E-value is a parameter that describes the number of hits one can “expect” to see by chance when searching a database of a particular size. It decreases exponentially with the score (S) that is assigned to a match between two sequences. Essentially,​ the E-value describes the random background noise that exists for matches between sequences. For example, an E-value of 1 assigned to a hit can be interpreted as meaning that in a database of the current size, one might expect to see one match with a similar score simply by chance. This means that the lower the E-value, or the closer it is to “0”, the higher is the “significance” of the match. However, it is important to note that searches with short sequences can be virtually identical and have relatively high E-value. This is because the calculation of the E-value also takes into account the length of the query sequence. This is because shorter sequences have a high probability of occurring in the database purely by chance."​ Find more information,​ on the [[http://​www.ncbi.nlm.nih.gov/​books/​NBK21106/​|NCBI online books]].+  *e-value: Is a number to assess how a similarity is likely to arise by chance. It involves a model of random sequences. The smaller ​the number the less likely the similarity occurred by chance. The [[http://​www.ncbi.nlm.nih.gov/​BLAST/​tutorial/​Altschul-1.html|NCBI Blast tutorial]] explains: “Expect value. The E-value is a parameter that describes the number of hits one can “expect” to see by chance when searching a database of a particular size. It decreases exponentially with the score (S) that is assigned to a match between two sequences. Essentially,​ the E-value describes the random background noise that exists for matches between sequences. For example, an E-value of 1 assigned to a hit can be interpreted as meaning that in a database of the current size, one might expect to see one match with a similar score simply by chance. This means that the lower the E-value, or the closer it is to “0”, the higher is the “significance” of the match. However, it is important to note that searches with short sequences can be virtually identical and have relatively high E-value. This is because the calculation of the E-value also takes into account the length of the query sequence. This is because shorter sequences have a high probability of occurring in the database purely by chance."​ Find more information,​ on the [[http://​www.ncbi.nlm.nih.gov/​books/​NBK21106/​|NCBI online books]].
   *HSP: high-scoring segment pairs; all segment pairs whose scores can not be improved by extension or trimming ​   *HSP: high-scoring segment pairs; all segment pairs whose scores can not be improved by extension or trimming ​
   *Identities = Proportion of identical residues between the query and the hit from database, ​   *Identities = Proportion of identical residues between the query and the hit from database, ​
Line 21: Line 21:
  
  
-Examples of uses of BLAST program are described ​in this wiki http://​qcbs.ca/​wiki/​commandline_remote_blast ​and http://​qcbs.ca/​wiki/​standalone_blast+Examples of uses of BLAST program are described ​[[http://​qcbs.ca/​wiki/​standalone_blast|here]] ​and [[http://​qcbs.ca/​wiki/​commandline_remote_blast|here]] on the wiki.