Install BLAST+ locally.

Find and install the latest version that corresponds to your operating system (MAC, Windows, Linux): ftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/

### DOWNLOAD A PREFORMATED DATABASE FROM NCBI###

$ update_blastdb.pl nr
$ tar -xvzf *tar.gz

### MAKE A REFERENCE DATABASE ###

$ makeblastdb -dbtype nucl -in genes1_fas_pathway.txt

OR

$ makeblastdb -dbtype prot -in Plantcyc_Enzymes_Without_Tags_BLASTset.fasta

### BLASTn ###

$ blastn -task dc-megablast -evalue 1e-10 -max_target_seqs 100 -query mygenes.fasta -db nr -num_threads 12 -outfmt 5 -out mygenes.blast.out

### BLASTn ###

$ blastn -task dc-megablast -evalue 1e-10 -max_target_seqs 5 -query mygenes.fasta -db database/ara_pathway/genes1_fas_pathway.txt -num_threads 4 -outfmt 6 -out mygenes.blast.out

### BLASTn – short sequences ###

$ nohup blastn -task blastn-short -evalue 1e-5 -max_target_seqs 20 -query mygenes.fasta -db ../../blast/database/all_mito_sequences_e50 -num_threads 8 -outfmt 6 -out mygenes.blast.out > log &

### BLASTx with nohup###

$ nohup blastx -evalue 1e-10 -max_target_seqs 5 -query mygenes.fasta -db database/ara_pathway/Plantcyc_Enzymes_Without_Tags_BLASTset.fasta -num_threads 6 -outfmt 6 -out mygenes.blast.out > log &

### BLASTp with gi restriction###

$ blastp -query mygenes.fasta -db ~/blast/database/ncbi_nr/nr -gilist gi_viridiplantae -outfmt 4 -out mygenes.blast.out

More examples

###Sequence searches using standalone BLAST In January 2011, Annie Archambault (research professional at the QCBS) and Christopher Cameron (professor at Université de Montréal) set up a sequence similarity between 451 spicule matrix proteins from the sea urchin (Strongylocentrotus purpuratus, an echinoderm) that are involved in biomineralization 1) and the genome of Saccoglossus kowlevskii (a hemichordate that forms biominerals) and Ciona (a hemichordate that do not form biominerals).

That the Saccoglossus genome is partially sequenced, but not available from GenBank and requires to run the BLAST algorithm locally (standalone) represents a challenge. Another challenge resides in organizing the BLAST output, and then in organizing the large number of searches results (451 similarity searches for each sea urchin protein sequences) into functional categories.

Parsing the Blast output file

The output file from the 451 sequences was too large to be easily understood. We developed a small script in R to parse the blast output file, and kept only the best match: the one hit that has the minimum e-value, for each query sequence. The Blast parser in R for tab delimited blast result files is available here, and was inspired by a forum post.

Method Here are the steps to quickly parse that large file

Rscript unique_lowest_Evalue.R <inputfile> <outputfile>

where you will type the name of your blast result file instead of <inputfile>, and you will type a name you wish for you output file instead of <outputfile>

Warnings:

  1. Make sure the number and the order of arguments are correct because an existing file will be overwritten if given the same name as your new output file.
  2. That script currently does not keep the headings of the columns, any improvement is welcome.

References

1)
Mann K, Wilt FH, Poustka AJ (2010) Proteomic analysis of sea urchin (Strongylocentrotus purpuratus) spicule matrix. Proteome Science 8 DOI 33 10.1186/1477-5956-8-33