Tutorial

To run GenomeBlast, the following four steps are involved:
   1. Choose blastp Options
   2. Upload Genome Sequence Files
   3. Genome Comparison Parameters
   4. Genome Comparison Results

1. Choose blastp Options

Detailed description of blastp options is available on the NCBI web (http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml). Default values for Expect, Word size, Matrix and Gap costs are the same as used by regular blastp.
Note the combination of BLOSUM and gap costs, e.g., BLOSUM62 (Gap costs: existence=11, extension=1), BLOSUM45 (Gap costs: existence=15, extension=2)

2. Upload Genome Sequence Files

First specify the number of genomes for comparison, then click OK. There will be an entry for each of the number of files specified. To upload, click Browse, and locate your file. Select the file by double clicking, or highlight and click the button "open" to bring the file location to the entry provided. Once all files have been located, press the button "Run genome-to-genome Blast."
The inputs are genome sequence files in the GenBank format. Each genome sequence file needs to include coding sequence (CDS) annotations in the FEATURE table.
To reset the number of genomes for comparison, click Reset button.

3. Genome Comparison Parameters

Once blast search is done, next step is to select options below to identify homologous genes/CDS.
        Coverage is the percentage of the total aligned sequences over the query sequence. The
           default threshold is 50%.
        Identity is the proportion (%) of identical amino acid pairs in the aligned region. The
           default threshold we use is 30%
        E-value, expectation value, is the number of different alignments with scores equivalent
           to or better that are expected to occur in a database search by chance. The default
           threshold we use is 10.
Disable pairwise blast preview pictures. The default will show previews.

4. Genome Comparison Results

On the results page you will find four sections:
        The first section shows the parameters, thresholds, and genomes provided.
        The second section is for you to find unique genes and homologous genes.
             a. Check genomes that you want to compare, and click the button "find" to view
                 the result.
             b. The page for Unique Genes lists unique gene candidates found in each genome.
                 Clicking the access number, you will see the NCBI entry. You can download the text
                 file with unique genes listed in each genome.
             c. The page for Homologous Genes lists homologous genes found in genomes
                 compared.
                 Clicking the access number, you will see the NCBI entry. You can download the text
                 file with unique genes listed in each genome.
        The third section displays pairwise comparison result. Where the user can have two
           options to view the 2D plot, from the drop-down box or click the link above each plot.
           Dots on the image correspond to genes, click them to view query sequence, subject
           sequence, and identity value.
        The last section shows comparison results of all genome provided. Three buttons lead you
           to view the results.
             a. A phylogenetic tree shows relationship among genomes. There are descriptions about
                 the algorithm and a tree file corresponding to the tree.
             b. Gene presence and absence table. You can download this database matrix in the
                 PHYLIP format.
             c. Genome comparison summary table. Which summarize the comparison results. Other
                 genes here mean those not defined as unique genes or homologous genes.