Basic Introduction of BLAST

31
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013

description

Basic Introduction of BLAST. Jundi Wang School of Computing CSC691 09/08/2013. Overview. 1.Introduction of BLAST Background of BLAST Programs in BLAST Function of BLAST 2.Application of BLAST BLAST web version Stand-alone BLAST. Background of BLAST. - PowerPoint PPT Presentation

Transcript of Basic Introduction of BLAST

Page 1: Basic Introduction of BLAST

Basic Introduction of BLAST

Jundi Wang

School of Computing

CSC691

09/08/2013

Page 2: Basic Introduction of BLAST

2

Overview

1.Introduction of BLAST Background of BLAST Programs in BLAST Function of BLAST2.Application of BLAST BLAST web version Stand-alone BLAST

Page 3: Basic Introduction of BLAST

3

Background of BLAST

BLAST (Basic Local Alignment Search Tool):1. The most widely used sequence similarity tool.2. BLAST is a family of programs: a) Compare protein queries to protein databases b) Compare nucleotide queries to nucleotide

databases

Page 4: Basic Introduction of BLAST

4

Background of BLAST

The Mechanism of BLAST Finding similar sequences:

BLAST finds similar sequences by locating short matches between the two sequence. After the first match, BLAST begins to make local alignments.

Page 5: Basic Introduction of BLAST

5

Programs in BLAST

There are some different BLAST programs available for different analytic purposes.

Nucleotide-nucleotide BLAST (blastn) This program, given a DNA query, returns the most similar

DNA sequences from the DNA database that the user specifies. Protein-protein BLAST (blastp) This program, given a protein query, returns the most similar

protein sequences from the protein database that the user specifies.

Page 6: Basic Introduction of BLAST

6

Programs in BLAST

Nucleotide 6-frame translation-protein (blastx) This program compares the six-frame conceptual translation

products of a nucleotide query sequence against a protein sequence database.

Nucleotide 6-frame translation-nucleotide 6-frame translation (tblastx)

This program translates the query nucleotide sequence in all six possible frames and compares it against the six-frame translations of a nucleotide sequence database.

Protein-nucleotide 6-frame translation (tblastn)

Page 7: Basic Introduction of BLAST

7

Programs in BLAST

Protein-nucleotide 6-frame translation (tblastn) This program compares a protein query against the all six

reading frames of a nucleotide sequence database.

Page 8: Basic Introduction of BLAST

8

Six-Frame Translation

Once a gene has been sequenced it is important to determine the correct open reading frame (ORF). Every region of DNA has six possible reading frames, three in each strand. The ORF that is used determines which amino acids will be encoded by a gene. Typically only one reading frame is used in translating a gene (in eukaryotes). The ORF starts with an start codon (ATG) and ends with a stop codon (TAA, TAG, or TGA).

Page 9: Basic Introduction of BLAST

9

Six-Frame Translation

Example:

Page 10: Basic Introduction of BLAST

10

Function of BLAST

BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.

Page 11: Basic Introduction of BLAST

11

Application of BLAST

BLAST web version:Advantage:1. It is convenient to operate.2. Synchronously updates the databases.Weakness:3. It is not good enough to analyze large-scale data.4. Programmer cannot customize the database.

http://www.ncbi.nlm.nih.gov/BLAST/

Page 12: Basic Introduction of BLAST

12

Application of BLAST

Stand-alone BLAST:Advantage:1. It can be used to analyze large-scale data.2. Programmer can customize the database.3. Programmer can download different version for different

operating system.Weakness:4. It is difficult to user who don’t have computer science

background. ftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/

Page 13: Basic Introduction of BLAST

13

Application of BLAST

Statistics in BLAST1. Score:

It is a value calculated from the number of gaps and substitutions associated with each aligned.

2. E value: It describes the likelihood that a sequence with a similar score will occur in the database by chance.

Page 14: Basic Introduction of BLAST

14

Application of BLAST

3. Identities:It describes the identity between query sequence and the sequence from database.

4. Positive: It describes the similarity between query sequence and the sequence from database.

5. Gaps: It describes the gaps between query sequence and the sequence from database.

Page 15: Basic Introduction of BLAST

15

Application of BLAST (web version)

NCBI BLAST web page

NucleotideAlignment Protein

Alignment

Page 16: Basic Introduction of BLAST

16

Application of BLAST (web version)

Query Sequence

Upload File

Query Subrange

Select Database

Page 17: Basic Introduction of BLAST

17

Application of BLAST (web version)

SelectAlgorithm

E value limitation

Page 18: Basic Introduction of BLAST

18

Application of BLAST (web version)

Click “Mouse” to check the detail

Page 19: Basic Introduction of BLAST

19

Application of BLAST (web version)100% Identity

No Gap

TheValue

ofscore is

the result

of ScoreMatrix

Page 20: Basic Introduction of BLAST

20

Application of BLAST (web version)

All compared sequence

NCBI Accession ID

Page 21: Basic Introduction of BLAST

21

Application of BLAST (Stand-alone Version)

Download and install Stand-alone BLASTftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/ Download the database from NCBIftp://ftp.ncbi.nlm.nih.gov/blast/db/ Download and install Activeperl from ActiveStatehttp://www.activestate.com/activeperl

Page 22: Basic Introduction of BLAST

22

Application of BLAST (Stand-alone Version)

Build local database1. Enter the BLAST folder and create a database (db) folder.2. Extract the downloaded database into the db folder. Link the database to the BLAST1. Execute cmd.exe and link the database to the BLAST by

Perl. Modify the environment variables1. Set the new path variable in order to make the BLAST to be

recognized.

Page 23: Basic Introduction of BLAST

23

Application of BLAST (Stand-alone Version)

Create a query sequence with a FASTA format.

Start with “>”Follow by the name or description of the query

sequence

Page 24: Basic Introduction of BLAST

24

Application of BLAST (Stand-alone Version)

Example: Compare the query sequence with the sequence from the “refseq_rna.00” database.

Different program in BLAST package

Link the “refseq_rna.00” to

the BLAST

Name of database

Page 25: Basic Introduction of BLAST

25

Application of BLAST (Stand-alone Version)

The basic information of the current database

Page 26: Basic Introduction of BLAST

26

Application of BLAST (Stand-alone Version)

Execute “blastn” program

Import the query sequence

Import the target database

Report the result in a new file

Page 27: Basic Introduction of BLAST

27

Application of BLAST (Stand-alone Version)

The length of compared sequence

NCBI Accession ID

All compared sequence Statistic evaluation

Page 28: Basic Introduction of BLAST

28

Application of BLAST (Stand-alone Version)

Page 29: Basic Introduction of BLAST

29

Application of BLAST (Stand-alone Version)

Page 30: Basic Introduction of BLAST

Summary

DNA Sequencing in a new species

NCBI BLAST

Database

Query

Import

Outpu

t

Page 31: Basic Introduction of BLAST

31

Thank You