BLAST Basic Local Alignment Search Tool. BLAST החכה BLAST (Basic Local Alignment Search Tool)...
-
date post
22-Dec-2015 -
Category
Documents
-
view
225 -
download
1
Transcript of BLAST Basic Local Alignment Search Tool. BLAST החכה BLAST (Basic Local Alignment Search Tool)...
BLAST
Basic Local Alignment Search Tool
BLAST החכה
BLAST (Basic Local Alignment Search Tool)allows rapid sequence comparison of a query sequence פיתיון בחכהה (nucleotides or amino acids)רצף שאילתא]] against a database להים הגדו
לצורך דיג מוצלח
יש לבחור חכה פיתיון ומקווה מים בהתאם לשאלה הביולוגית
Comparing the query sequence to known sequences in databases is fundamental to understanding the relatedness of any query sequence to other known proteins or DNA sequences
Applications include
bull Identifying shared similarities with sequences already
deposited in the databanks (orthologs and paralogs)
bull Discovering new genes or proteins (ascertaining
existence of a putative ORF)
bull Discovering variants of genes or proteins
bullIdentifying functional motifs shared with other proteins
bull Investigating expressed sequence tags (ESTs)
bull Exploring protein structure and function
Why use local alignment for database searches
Local alignment is a useful approach to
DB searching because many query
sequences have domains active sites or
other motifs that have local but not
global regions of similarity to other sequences
BLAST(1) for the query find the list of high scoring words of length w
Query Sequence of length L
For each word from the query sequence find the list of words that will score at least T when scored using a pair-score matrix(eg PAM 250 BLOSUM)
BLAST (cont)(2) Compare the word list to the database and identify exact matches
WordList
Exact matches of words from word lists
databasesequence
(3) For each word match extend the alignment in both directions to find alignments that score greater than a threshold of value S
maximal segment pairs (MSPs)
Blast is a heuristic algorythm
לא משווים את מלוא רצף השאילתאלמלוא האורך של כא מן הרצפים במאגר
אלא מבצעים חיפוש (מרחב החיפוש)חלקי עס קירוב
Speed vs sensitivityDoes not find ALL best matches
False negativesכיצד נעריך את הממצאים המתקבלים
Raw score S of the alignment is usuallycalculated by summing the scores formatches mismatches and gaps in thealignment
Normalized score (bits) - bit scores fromdifferent alignments even those employingdifferent scoring matrices can be comparedThe higher the score the better the alignmentbut the significance of an alignment can notbe deduced from the score alone
E-value (Expectation value)bull Expect value of 10 for a match means in a
database of current size one might expect to see 10 matches with a similar or better score simply by chance alone
bull E-value is the most commonly used threshold in
database searches Only those hits with E-values
smaller than the set threshold will be reported in
the output
bull Increasing the E-value enables you to see
biologically related sequences but statistically
insignificant
To evaluate the alignmentbull Examine statistical parameters1048707Normalized score1048707E value1048707 identity1048707 similarity1048707 gapsbull Examine the alignment itselfbull Use biological common senseDonrsquot rely only on statistical significance
מרוב עצים לא רואים את היער
יותר מידי חזרות על אותם רצפים בעלי מובהקות גבוהה לא רואים רצפים בעלי דמיון נמוך יותר
שעשויים אף הם להיות מעניינים
What can we do if there are too many matches
bullLimit DB
bullLimit organism
bullFilter reported entries by keyword
bull(Limit to a specific domain)
bullChange matrix andor gap penalties
bullChange E-value
bullAdd filter for low complexity
ספירת האפשרויות השונות
What can we do if there are hardly
any matches
bullCheck choice of DB
bullCheck choice of organism
bullRemove filter for low complexity
bullChange matrix or gap penalties
bullIncrease E-value
DNA vs Protein searchesIf we have a nucleotide sequence should we search the
DNA databases only Or should we translate it to protein and search protein databases
Translating causes loss of information but protein sequence is more conserved than DNA sequence
It is therefore advisable to translate a nucleotide sequence to protein and search protein databases for homology
Query DNA Protein
Database DNA Protein
No ORF found No similar protein sequences were found Specific DNA databases are available (EST)
To find duplicated genes in a genome
To find pseudogenes
To find the location of non-protein coding genes
in the genome (siRNA etc)
Why use a nucleotide sequence after all
Blast flavors
BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)
Query DNA Protein
DB DNA Protein
Uses of BLAST programs
BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db
DNA protein
If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search
tBLASTn
tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames
Protein DNA
You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest
tBLASTx
tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames
DNA DNA
(nr db cannot be used because itrsquos too large)
Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)
E-value
- Slide 23
- Slide 24
- Slide 25
-
BLAST החכה
BLAST (Basic Local Alignment Search Tool)allows rapid sequence comparison of a query sequence פיתיון בחכהה (nucleotides or amino acids)רצף שאילתא]] against a database להים הגדו
לצורך דיג מוצלח
יש לבחור חכה פיתיון ומקווה מים בהתאם לשאלה הביולוגית
Comparing the query sequence to known sequences in databases is fundamental to understanding the relatedness of any query sequence to other known proteins or DNA sequences
Applications include
bull Identifying shared similarities with sequences already
deposited in the databanks (orthologs and paralogs)
bull Discovering new genes or proteins (ascertaining
existence of a putative ORF)
bull Discovering variants of genes or proteins
bullIdentifying functional motifs shared with other proteins
bull Investigating expressed sequence tags (ESTs)
bull Exploring protein structure and function
Why use local alignment for database searches
Local alignment is a useful approach to
DB searching because many query
sequences have domains active sites or
other motifs that have local but not
global regions of similarity to other sequences
BLAST(1) for the query find the list of high scoring words of length w
Query Sequence of length L
For each word from the query sequence find the list of words that will score at least T when scored using a pair-score matrix(eg PAM 250 BLOSUM)
BLAST (cont)(2) Compare the word list to the database and identify exact matches
WordList
Exact matches of words from word lists
databasesequence
(3) For each word match extend the alignment in both directions to find alignments that score greater than a threshold of value S
maximal segment pairs (MSPs)
Blast is a heuristic algorythm
לא משווים את מלוא רצף השאילתאלמלוא האורך של כא מן הרצפים במאגר
אלא מבצעים חיפוש (מרחב החיפוש)חלקי עס קירוב
Speed vs sensitivityDoes not find ALL best matches
False negativesכיצד נעריך את הממצאים המתקבלים
Raw score S of the alignment is usuallycalculated by summing the scores formatches mismatches and gaps in thealignment
Normalized score (bits) - bit scores fromdifferent alignments even those employingdifferent scoring matrices can be comparedThe higher the score the better the alignmentbut the significance of an alignment can notbe deduced from the score alone
E-value (Expectation value)bull Expect value of 10 for a match means in a
database of current size one might expect to see 10 matches with a similar or better score simply by chance alone
bull E-value is the most commonly used threshold in
database searches Only those hits with E-values
smaller than the set threshold will be reported in
the output
bull Increasing the E-value enables you to see
biologically related sequences but statistically
insignificant
To evaluate the alignmentbull Examine statistical parameters1048707Normalized score1048707E value1048707 identity1048707 similarity1048707 gapsbull Examine the alignment itselfbull Use biological common senseDonrsquot rely only on statistical significance
מרוב עצים לא רואים את היער
יותר מידי חזרות על אותם רצפים בעלי מובהקות גבוהה לא רואים רצפים בעלי דמיון נמוך יותר
שעשויים אף הם להיות מעניינים
What can we do if there are too many matches
bullLimit DB
bullLimit organism
bullFilter reported entries by keyword
bull(Limit to a specific domain)
bullChange matrix andor gap penalties
bullChange E-value
bullAdd filter for low complexity
ספירת האפשרויות השונות
What can we do if there are hardly
any matches
bullCheck choice of DB
bullCheck choice of organism
bullRemove filter for low complexity
bullChange matrix or gap penalties
bullIncrease E-value
DNA vs Protein searchesIf we have a nucleotide sequence should we search the
DNA databases only Or should we translate it to protein and search protein databases
Translating causes loss of information but protein sequence is more conserved than DNA sequence
It is therefore advisable to translate a nucleotide sequence to protein and search protein databases for homology
Query DNA Protein
Database DNA Protein
No ORF found No similar protein sequences were found Specific DNA databases are available (EST)
To find duplicated genes in a genome
To find pseudogenes
To find the location of non-protein coding genes
in the genome (siRNA etc)
Why use a nucleotide sequence after all
Blast flavors
BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)
Query DNA Protein
DB DNA Protein
Uses of BLAST programs
BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db
DNA protein
If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search
tBLASTn
tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames
Protein DNA
You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest
tBLASTx
tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames
DNA DNA
(nr db cannot be used because itrsquos too large)
Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)
E-value
- Slide 23
- Slide 24
- Slide 25
-
Comparing the query sequence to known sequences in databases is fundamental to understanding the relatedness of any query sequence to other known proteins or DNA sequences
Applications include
bull Identifying shared similarities with sequences already
deposited in the databanks (orthologs and paralogs)
bull Discovering new genes or proteins (ascertaining
existence of a putative ORF)
bull Discovering variants of genes or proteins
bullIdentifying functional motifs shared with other proteins
bull Investigating expressed sequence tags (ESTs)
bull Exploring protein structure and function
Why use local alignment for database searches
Local alignment is a useful approach to
DB searching because many query
sequences have domains active sites or
other motifs that have local but not
global regions of similarity to other sequences
BLAST(1) for the query find the list of high scoring words of length w
Query Sequence of length L
For each word from the query sequence find the list of words that will score at least T when scored using a pair-score matrix(eg PAM 250 BLOSUM)
BLAST (cont)(2) Compare the word list to the database and identify exact matches
WordList
Exact matches of words from word lists
databasesequence
(3) For each word match extend the alignment in both directions to find alignments that score greater than a threshold of value S
maximal segment pairs (MSPs)
Blast is a heuristic algorythm
לא משווים את מלוא רצף השאילתאלמלוא האורך של כא מן הרצפים במאגר
אלא מבצעים חיפוש (מרחב החיפוש)חלקי עס קירוב
Speed vs sensitivityDoes not find ALL best matches
False negativesכיצד נעריך את הממצאים המתקבלים
Raw score S of the alignment is usuallycalculated by summing the scores formatches mismatches and gaps in thealignment
Normalized score (bits) - bit scores fromdifferent alignments even those employingdifferent scoring matrices can be comparedThe higher the score the better the alignmentbut the significance of an alignment can notbe deduced from the score alone
E-value (Expectation value)bull Expect value of 10 for a match means in a
database of current size one might expect to see 10 matches with a similar or better score simply by chance alone
bull E-value is the most commonly used threshold in
database searches Only those hits with E-values
smaller than the set threshold will be reported in
the output
bull Increasing the E-value enables you to see
biologically related sequences but statistically
insignificant
To evaluate the alignmentbull Examine statistical parameters1048707Normalized score1048707E value1048707 identity1048707 similarity1048707 gapsbull Examine the alignment itselfbull Use biological common senseDonrsquot rely only on statistical significance
מרוב עצים לא רואים את היער
יותר מידי חזרות על אותם רצפים בעלי מובהקות גבוהה לא רואים רצפים בעלי דמיון נמוך יותר
שעשויים אף הם להיות מעניינים
What can we do if there are too many matches
bullLimit DB
bullLimit organism
bullFilter reported entries by keyword
bull(Limit to a specific domain)
bullChange matrix andor gap penalties
bullChange E-value
bullAdd filter for low complexity
ספירת האפשרויות השונות
What can we do if there are hardly
any matches
bullCheck choice of DB
bullCheck choice of organism
bullRemove filter for low complexity
bullChange matrix or gap penalties
bullIncrease E-value
DNA vs Protein searchesIf we have a nucleotide sequence should we search the
DNA databases only Or should we translate it to protein and search protein databases
Translating causes loss of information but protein sequence is more conserved than DNA sequence
It is therefore advisable to translate a nucleotide sequence to protein and search protein databases for homology
Query DNA Protein
Database DNA Protein
No ORF found No similar protein sequences were found Specific DNA databases are available (EST)
To find duplicated genes in a genome
To find pseudogenes
To find the location of non-protein coding genes
in the genome (siRNA etc)
Why use a nucleotide sequence after all
Blast flavors
BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)
Query DNA Protein
DB DNA Protein
Uses of BLAST programs
BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db
DNA protein
If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search
tBLASTn
tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames
Protein DNA
You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest
tBLASTx
tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames
DNA DNA
(nr db cannot be used because itrsquos too large)
Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)
E-value
- Slide 23
- Slide 24
- Slide 25
-
Why use local alignment for database searches
Local alignment is a useful approach to
DB searching because many query
sequences have domains active sites or
other motifs that have local but not
global regions of similarity to other sequences
BLAST(1) for the query find the list of high scoring words of length w
Query Sequence of length L
For each word from the query sequence find the list of words that will score at least T when scored using a pair-score matrix(eg PAM 250 BLOSUM)
BLAST (cont)(2) Compare the word list to the database and identify exact matches
WordList
Exact matches of words from word lists
databasesequence
(3) For each word match extend the alignment in both directions to find alignments that score greater than a threshold of value S
maximal segment pairs (MSPs)
Blast is a heuristic algorythm
לא משווים את מלוא רצף השאילתאלמלוא האורך של כא מן הרצפים במאגר
אלא מבצעים חיפוש (מרחב החיפוש)חלקי עס קירוב
Speed vs sensitivityDoes not find ALL best matches
False negativesכיצד נעריך את הממצאים המתקבלים
Raw score S of the alignment is usuallycalculated by summing the scores formatches mismatches and gaps in thealignment
Normalized score (bits) - bit scores fromdifferent alignments even those employingdifferent scoring matrices can be comparedThe higher the score the better the alignmentbut the significance of an alignment can notbe deduced from the score alone
E-value (Expectation value)bull Expect value of 10 for a match means in a
database of current size one might expect to see 10 matches with a similar or better score simply by chance alone
bull E-value is the most commonly used threshold in
database searches Only those hits with E-values
smaller than the set threshold will be reported in
the output
bull Increasing the E-value enables you to see
biologically related sequences but statistically
insignificant
To evaluate the alignmentbull Examine statistical parameters1048707Normalized score1048707E value1048707 identity1048707 similarity1048707 gapsbull Examine the alignment itselfbull Use biological common senseDonrsquot rely only on statistical significance
מרוב עצים לא רואים את היער
יותר מידי חזרות על אותם רצפים בעלי מובהקות גבוהה לא רואים רצפים בעלי דמיון נמוך יותר
שעשויים אף הם להיות מעניינים
What can we do if there are too many matches
bullLimit DB
bullLimit organism
bullFilter reported entries by keyword
bull(Limit to a specific domain)
bullChange matrix andor gap penalties
bullChange E-value
bullAdd filter for low complexity
ספירת האפשרויות השונות
What can we do if there are hardly
any matches
bullCheck choice of DB
bullCheck choice of organism
bullRemove filter for low complexity
bullChange matrix or gap penalties
bullIncrease E-value
DNA vs Protein searchesIf we have a nucleotide sequence should we search the
DNA databases only Or should we translate it to protein and search protein databases
Translating causes loss of information but protein sequence is more conserved than DNA sequence
It is therefore advisable to translate a nucleotide sequence to protein and search protein databases for homology
Query DNA Protein
Database DNA Protein
No ORF found No similar protein sequences were found Specific DNA databases are available (EST)
To find duplicated genes in a genome
To find pseudogenes
To find the location of non-protein coding genes
in the genome (siRNA etc)
Why use a nucleotide sequence after all
Blast flavors
BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)
Query DNA Protein
DB DNA Protein
Uses of BLAST programs
BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db
DNA protein
If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search
tBLASTn
tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames
Protein DNA
You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest
tBLASTx
tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames
DNA DNA
(nr db cannot be used because itrsquos too large)
Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)
E-value
- Slide 23
- Slide 24
- Slide 25
-
BLAST(1) for the query find the list of high scoring words of length w
Query Sequence of length L
For each word from the query sequence find the list of words that will score at least T when scored using a pair-score matrix(eg PAM 250 BLOSUM)
BLAST (cont)(2) Compare the word list to the database and identify exact matches
WordList
Exact matches of words from word lists
databasesequence
(3) For each word match extend the alignment in both directions to find alignments that score greater than a threshold of value S
maximal segment pairs (MSPs)
Blast is a heuristic algorythm
לא משווים את מלוא רצף השאילתאלמלוא האורך של כא מן הרצפים במאגר
אלא מבצעים חיפוש (מרחב החיפוש)חלקי עס קירוב
Speed vs sensitivityDoes not find ALL best matches
False negativesכיצד נעריך את הממצאים המתקבלים
Raw score S of the alignment is usuallycalculated by summing the scores formatches mismatches and gaps in thealignment
Normalized score (bits) - bit scores fromdifferent alignments even those employingdifferent scoring matrices can be comparedThe higher the score the better the alignmentbut the significance of an alignment can notbe deduced from the score alone
E-value (Expectation value)bull Expect value of 10 for a match means in a
database of current size one might expect to see 10 matches with a similar or better score simply by chance alone
bull E-value is the most commonly used threshold in
database searches Only those hits with E-values
smaller than the set threshold will be reported in
the output
bull Increasing the E-value enables you to see
biologically related sequences but statistically
insignificant
To evaluate the alignmentbull Examine statistical parameters1048707Normalized score1048707E value1048707 identity1048707 similarity1048707 gapsbull Examine the alignment itselfbull Use biological common senseDonrsquot rely only on statistical significance
מרוב עצים לא רואים את היער
יותר מידי חזרות על אותם רצפים בעלי מובהקות גבוהה לא רואים רצפים בעלי דמיון נמוך יותר
שעשויים אף הם להיות מעניינים
What can we do if there are too many matches
bullLimit DB
bullLimit organism
bullFilter reported entries by keyword
bull(Limit to a specific domain)
bullChange matrix andor gap penalties
bullChange E-value
bullAdd filter for low complexity
ספירת האפשרויות השונות
What can we do if there are hardly
any matches
bullCheck choice of DB
bullCheck choice of organism
bullRemove filter for low complexity
bullChange matrix or gap penalties
bullIncrease E-value
DNA vs Protein searchesIf we have a nucleotide sequence should we search the
DNA databases only Or should we translate it to protein and search protein databases
Translating causes loss of information but protein sequence is more conserved than DNA sequence
It is therefore advisable to translate a nucleotide sequence to protein and search protein databases for homology
Query DNA Protein
Database DNA Protein
No ORF found No similar protein sequences were found Specific DNA databases are available (EST)
To find duplicated genes in a genome
To find pseudogenes
To find the location of non-protein coding genes
in the genome (siRNA etc)
Why use a nucleotide sequence after all
Blast flavors
BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)
Query DNA Protein
DB DNA Protein
Uses of BLAST programs
BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db
DNA protein
If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search
tBLASTn
tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames
Protein DNA
You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest
tBLASTx
tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames
DNA DNA
(nr db cannot be used because itrsquos too large)
Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)
E-value
- Slide 23
- Slide 24
- Slide 25
-
BLAST (cont)(2) Compare the word list to the database and identify exact matches
WordList
Exact matches of words from word lists
databasesequence
(3) For each word match extend the alignment in both directions to find alignments that score greater than a threshold of value S
maximal segment pairs (MSPs)
Blast is a heuristic algorythm
לא משווים את מלוא רצף השאילתאלמלוא האורך של כא מן הרצפים במאגר
אלא מבצעים חיפוש (מרחב החיפוש)חלקי עס קירוב
Speed vs sensitivityDoes not find ALL best matches
False negativesכיצד נעריך את הממצאים המתקבלים
Raw score S of the alignment is usuallycalculated by summing the scores formatches mismatches and gaps in thealignment
Normalized score (bits) - bit scores fromdifferent alignments even those employingdifferent scoring matrices can be comparedThe higher the score the better the alignmentbut the significance of an alignment can notbe deduced from the score alone
E-value (Expectation value)bull Expect value of 10 for a match means in a
database of current size one might expect to see 10 matches with a similar or better score simply by chance alone
bull E-value is the most commonly used threshold in
database searches Only those hits with E-values
smaller than the set threshold will be reported in
the output
bull Increasing the E-value enables you to see
biologically related sequences but statistically
insignificant
To evaluate the alignmentbull Examine statistical parameters1048707Normalized score1048707E value1048707 identity1048707 similarity1048707 gapsbull Examine the alignment itselfbull Use biological common senseDonrsquot rely only on statistical significance
מרוב עצים לא רואים את היער
יותר מידי חזרות על אותם רצפים בעלי מובהקות גבוהה לא רואים רצפים בעלי דמיון נמוך יותר
שעשויים אף הם להיות מעניינים
What can we do if there are too many matches
bullLimit DB
bullLimit organism
bullFilter reported entries by keyword
bull(Limit to a specific domain)
bullChange matrix andor gap penalties
bullChange E-value
bullAdd filter for low complexity
ספירת האפשרויות השונות
What can we do if there are hardly
any matches
bullCheck choice of DB
bullCheck choice of organism
bullRemove filter for low complexity
bullChange matrix or gap penalties
bullIncrease E-value
DNA vs Protein searchesIf we have a nucleotide sequence should we search the
DNA databases only Or should we translate it to protein and search protein databases
Translating causes loss of information but protein sequence is more conserved than DNA sequence
It is therefore advisable to translate a nucleotide sequence to protein and search protein databases for homology
Query DNA Protein
Database DNA Protein
No ORF found No similar protein sequences were found Specific DNA databases are available (EST)
To find duplicated genes in a genome
To find pseudogenes
To find the location of non-protein coding genes
in the genome (siRNA etc)
Why use a nucleotide sequence after all
Blast flavors
BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)
Query DNA Protein
DB DNA Protein
Uses of BLAST programs
BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db
DNA protein
If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search
tBLASTn
tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames
Protein DNA
You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest
tBLASTx
tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames
DNA DNA
(nr db cannot be used because itrsquos too large)
Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)
E-value
- Slide 23
- Slide 24
- Slide 25
-
Blast is a heuristic algorythm
לא משווים את מלוא רצף השאילתאלמלוא האורך של כא מן הרצפים במאגר
אלא מבצעים חיפוש (מרחב החיפוש)חלקי עס קירוב
Speed vs sensitivityDoes not find ALL best matches
False negativesכיצד נעריך את הממצאים המתקבלים
Raw score S of the alignment is usuallycalculated by summing the scores formatches mismatches and gaps in thealignment
Normalized score (bits) - bit scores fromdifferent alignments even those employingdifferent scoring matrices can be comparedThe higher the score the better the alignmentbut the significance of an alignment can notbe deduced from the score alone
E-value (Expectation value)bull Expect value of 10 for a match means in a
database of current size one might expect to see 10 matches with a similar or better score simply by chance alone
bull E-value is the most commonly used threshold in
database searches Only those hits with E-values
smaller than the set threshold will be reported in
the output
bull Increasing the E-value enables you to see
biologically related sequences but statistically
insignificant
To evaluate the alignmentbull Examine statistical parameters1048707Normalized score1048707E value1048707 identity1048707 similarity1048707 gapsbull Examine the alignment itselfbull Use biological common senseDonrsquot rely only on statistical significance
מרוב עצים לא רואים את היער
יותר מידי חזרות על אותם רצפים בעלי מובהקות גבוהה לא רואים רצפים בעלי דמיון נמוך יותר
שעשויים אף הם להיות מעניינים
What can we do if there are too many matches
bullLimit DB
bullLimit organism
bullFilter reported entries by keyword
bull(Limit to a specific domain)
bullChange matrix andor gap penalties
bullChange E-value
bullAdd filter for low complexity
ספירת האפשרויות השונות
What can we do if there are hardly
any matches
bullCheck choice of DB
bullCheck choice of organism
bullRemove filter for low complexity
bullChange matrix or gap penalties
bullIncrease E-value
DNA vs Protein searchesIf we have a nucleotide sequence should we search the
DNA databases only Or should we translate it to protein and search protein databases
Translating causes loss of information but protein sequence is more conserved than DNA sequence
It is therefore advisable to translate a nucleotide sequence to protein and search protein databases for homology
Query DNA Protein
Database DNA Protein
No ORF found No similar protein sequences were found Specific DNA databases are available (EST)
To find duplicated genes in a genome
To find pseudogenes
To find the location of non-protein coding genes
in the genome (siRNA etc)
Why use a nucleotide sequence after all
Blast flavors
BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)
Query DNA Protein
DB DNA Protein
Uses of BLAST programs
BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db
DNA protein
If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search
tBLASTn
tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames
Protein DNA
You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest
tBLASTx
tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames
DNA DNA
(nr db cannot be used because itrsquos too large)
Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)
E-value
- Slide 23
- Slide 24
- Slide 25
-
Raw score S of the alignment is usuallycalculated by summing the scores formatches mismatches and gaps in thealignment
Normalized score (bits) - bit scores fromdifferent alignments even those employingdifferent scoring matrices can be comparedThe higher the score the better the alignmentbut the significance of an alignment can notbe deduced from the score alone
E-value (Expectation value)bull Expect value of 10 for a match means in a
database of current size one might expect to see 10 matches with a similar or better score simply by chance alone
bull E-value is the most commonly used threshold in
database searches Only those hits with E-values
smaller than the set threshold will be reported in
the output
bull Increasing the E-value enables you to see
biologically related sequences but statistically
insignificant
To evaluate the alignmentbull Examine statistical parameters1048707Normalized score1048707E value1048707 identity1048707 similarity1048707 gapsbull Examine the alignment itselfbull Use biological common senseDonrsquot rely only on statistical significance
מרוב עצים לא רואים את היער
יותר מידי חזרות על אותם רצפים בעלי מובהקות גבוהה לא רואים רצפים בעלי דמיון נמוך יותר
שעשויים אף הם להיות מעניינים
What can we do if there are too many matches
bullLimit DB
bullLimit organism
bullFilter reported entries by keyword
bull(Limit to a specific domain)
bullChange matrix andor gap penalties
bullChange E-value
bullAdd filter for low complexity
ספירת האפשרויות השונות
What can we do if there are hardly
any matches
bullCheck choice of DB
bullCheck choice of organism
bullRemove filter for low complexity
bullChange matrix or gap penalties
bullIncrease E-value
DNA vs Protein searchesIf we have a nucleotide sequence should we search the
DNA databases only Or should we translate it to protein and search protein databases
Translating causes loss of information but protein sequence is more conserved than DNA sequence
It is therefore advisable to translate a nucleotide sequence to protein and search protein databases for homology
Query DNA Protein
Database DNA Protein
No ORF found No similar protein sequences were found Specific DNA databases are available (EST)
To find duplicated genes in a genome
To find pseudogenes
To find the location of non-protein coding genes
in the genome (siRNA etc)
Why use a nucleotide sequence after all
Blast flavors
BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)
Query DNA Protein
DB DNA Protein
Uses of BLAST programs
BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db
DNA protein
If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search
tBLASTn
tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames
Protein DNA
You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest
tBLASTx
tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames
DNA DNA
(nr db cannot be used because itrsquos too large)
Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)
E-value
- Slide 23
- Slide 24
- Slide 25
-
E-value (Expectation value)bull Expect value of 10 for a match means in a
database of current size one might expect to see 10 matches with a similar or better score simply by chance alone
bull E-value is the most commonly used threshold in
database searches Only those hits with E-values
smaller than the set threshold will be reported in
the output
bull Increasing the E-value enables you to see
biologically related sequences but statistically
insignificant
To evaluate the alignmentbull Examine statistical parameters1048707Normalized score1048707E value1048707 identity1048707 similarity1048707 gapsbull Examine the alignment itselfbull Use biological common senseDonrsquot rely only on statistical significance
מרוב עצים לא רואים את היער
יותר מידי חזרות על אותם רצפים בעלי מובהקות גבוהה לא רואים רצפים בעלי דמיון נמוך יותר
שעשויים אף הם להיות מעניינים
What can we do if there are too many matches
bullLimit DB
bullLimit organism
bullFilter reported entries by keyword
bull(Limit to a specific domain)
bullChange matrix andor gap penalties
bullChange E-value
bullAdd filter for low complexity
ספירת האפשרויות השונות
What can we do if there are hardly
any matches
bullCheck choice of DB
bullCheck choice of organism
bullRemove filter for low complexity
bullChange matrix or gap penalties
bullIncrease E-value
DNA vs Protein searchesIf we have a nucleotide sequence should we search the
DNA databases only Or should we translate it to protein and search protein databases
Translating causes loss of information but protein sequence is more conserved than DNA sequence
It is therefore advisable to translate a nucleotide sequence to protein and search protein databases for homology
Query DNA Protein
Database DNA Protein
No ORF found No similar protein sequences were found Specific DNA databases are available (EST)
To find duplicated genes in a genome
To find pseudogenes
To find the location of non-protein coding genes
in the genome (siRNA etc)
Why use a nucleotide sequence after all
Blast flavors
BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)
Query DNA Protein
DB DNA Protein
Uses of BLAST programs
BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db
DNA protein
If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search
tBLASTn
tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames
Protein DNA
You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest
tBLASTx
tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames
DNA DNA
(nr db cannot be used because itrsquos too large)
Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)
E-value
- Slide 23
- Slide 24
- Slide 25
-
To evaluate the alignmentbull Examine statistical parameters1048707Normalized score1048707E value1048707 identity1048707 similarity1048707 gapsbull Examine the alignment itselfbull Use biological common senseDonrsquot rely only on statistical significance
מרוב עצים לא רואים את היער
יותר מידי חזרות על אותם רצפים בעלי מובהקות גבוהה לא רואים רצפים בעלי דמיון נמוך יותר
שעשויים אף הם להיות מעניינים
What can we do if there are too many matches
bullLimit DB
bullLimit organism
bullFilter reported entries by keyword
bull(Limit to a specific domain)
bullChange matrix andor gap penalties
bullChange E-value
bullAdd filter for low complexity
ספירת האפשרויות השונות
What can we do if there are hardly
any matches
bullCheck choice of DB
bullCheck choice of organism
bullRemove filter for low complexity
bullChange matrix or gap penalties
bullIncrease E-value
DNA vs Protein searchesIf we have a nucleotide sequence should we search the
DNA databases only Or should we translate it to protein and search protein databases
Translating causes loss of information but protein sequence is more conserved than DNA sequence
It is therefore advisable to translate a nucleotide sequence to protein and search protein databases for homology
Query DNA Protein
Database DNA Protein
No ORF found No similar protein sequences were found Specific DNA databases are available (EST)
To find duplicated genes in a genome
To find pseudogenes
To find the location of non-protein coding genes
in the genome (siRNA etc)
Why use a nucleotide sequence after all
Blast flavors
BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)
Query DNA Protein
DB DNA Protein
Uses of BLAST programs
BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db
DNA protein
If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search
tBLASTn
tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames
Protein DNA
You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest
tBLASTx
tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames
DNA DNA
(nr db cannot be used because itrsquos too large)
Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)
E-value
- Slide 23
- Slide 24
- Slide 25
-
מרוב עצים לא רואים את היער
יותר מידי חזרות על אותם רצפים בעלי מובהקות גבוהה לא רואים רצפים בעלי דמיון נמוך יותר
שעשויים אף הם להיות מעניינים
What can we do if there are too many matches
bullLimit DB
bullLimit organism
bullFilter reported entries by keyword
bull(Limit to a specific domain)
bullChange matrix andor gap penalties
bullChange E-value
bullAdd filter for low complexity
ספירת האפשרויות השונות
What can we do if there are hardly
any matches
bullCheck choice of DB
bullCheck choice of organism
bullRemove filter for low complexity
bullChange matrix or gap penalties
bullIncrease E-value
DNA vs Protein searchesIf we have a nucleotide sequence should we search the
DNA databases only Or should we translate it to protein and search protein databases
Translating causes loss of information but protein sequence is more conserved than DNA sequence
It is therefore advisable to translate a nucleotide sequence to protein and search protein databases for homology
Query DNA Protein
Database DNA Protein
No ORF found No similar protein sequences were found Specific DNA databases are available (EST)
To find duplicated genes in a genome
To find pseudogenes
To find the location of non-protein coding genes
in the genome (siRNA etc)
Why use a nucleotide sequence after all
Blast flavors
BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)
Query DNA Protein
DB DNA Protein
Uses of BLAST programs
BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db
DNA protein
If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search
tBLASTn
tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames
Protein DNA
You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest
tBLASTx
tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames
DNA DNA
(nr db cannot be used because itrsquos too large)
Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)
E-value
- Slide 23
- Slide 24
- Slide 25
-
bullLimit DB
bullLimit organism
bullFilter reported entries by keyword
bull(Limit to a specific domain)
bullChange matrix andor gap penalties
bullChange E-value
bullAdd filter for low complexity
ספירת האפשרויות השונות
What can we do if there are hardly
any matches
bullCheck choice of DB
bullCheck choice of organism
bullRemove filter for low complexity
bullChange matrix or gap penalties
bullIncrease E-value
DNA vs Protein searchesIf we have a nucleotide sequence should we search the
DNA databases only Or should we translate it to protein and search protein databases
Translating causes loss of information but protein sequence is more conserved than DNA sequence
It is therefore advisable to translate a nucleotide sequence to protein and search protein databases for homology
Query DNA Protein
Database DNA Protein
No ORF found No similar protein sequences were found Specific DNA databases are available (EST)
To find duplicated genes in a genome
To find pseudogenes
To find the location of non-protein coding genes
in the genome (siRNA etc)
Why use a nucleotide sequence after all
Blast flavors
BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)
Query DNA Protein
DB DNA Protein
Uses of BLAST programs
BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db
DNA protein
If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search
tBLASTn
tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames
Protein DNA
You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest
tBLASTx
tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames
DNA DNA
(nr db cannot be used because itrsquos too large)
Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)
E-value
- Slide 23
- Slide 24
- Slide 25
-
What can we do if there are hardly
any matches
bullCheck choice of DB
bullCheck choice of organism
bullRemove filter for low complexity
bullChange matrix or gap penalties
bullIncrease E-value
DNA vs Protein searchesIf we have a nucleotide sequence should we search the
DNA databases only Or should we translate it to protein and search protein databases
Translating causes loss of information but protein sequence is more conserved than DNA sequence
It is therefore advisable to translate a nucleotide sequence to protein and search protein databases for homology
Query DNA Protein
Database DNA Protein
No ORF found No similar protein sequences were found Specific DNA databases are available (EST)
To find duplicated genes in a genome
To find pseudogenes
To find the location of non-protein coding genes
in the genome (siRNA etc)
Why use a nucleotide sequence after all
Blast flavors
BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)
Query DNA Protein
DB DNA Protein
Uses of BLAST programs
BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db
DNA protein
If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search
tBLASTn
tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames
Protein DNA
You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest
tBLASTx
tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames
DNA DNA
(nr db cannot be used because itrsquos too large)
Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)
E-value
- Slide 23
- Slide 24
- Slide 25
-
bullCheck choice of DB
bullCheck choice of organism
bullRemove filter for low complexity
bullChange matrix or gap penalties
bullIncrease E-value
DNA vs Protein searchesIf we have a nucleotide sequence should we search the
DNA databases only Or should we translate it to protein and search protein databases
Translating causes loss of information but protein sequence is more conserved than DNA sequence
It is therefore advisable to translate a nucleotide sequence to protein and search protein databases for homology
Query DNA Protein
Database DNA Protein
No ORF found No similar protein sequences were found Specific DNA databases are available (EST)
To find duplicated genes in a genome
To find pseudogenes
To find the location of non-protein coding genes
in the genome (siRNA etc)
Why use a nucleotide sequence after all
Blast flavors
BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)
Query DNA Protein
DB DNA Protein
Uses of BLAST programs
BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db
DNA protein
If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search
tBLASTn
tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames
Protein DNA
You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest
tBLASTx
tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames
DNA DNA
(nr db cannot be used because itrsquos too large)
Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)
E-value
- Slide 23
- Slide 24
- Slide 25
-
DNA vs Protein searchesIf we have a nucleotide sequence should we search the
DNA databases only Or should we translate it to protein and search protein databases
Translating causes loss of information but protein sequence is more conserved than DNA sequence
It is therefore advisable to translate a nucleotide sequence to protein and search protein databases for homology
Query DNA Protein
Database DNA Protein
No ORF found No similar protein sequences were found Specific DNA databases are available (EST)
To find duplicated genes in a genome
To find pseudogenes
To find the location of non-protein coding genes
in the genome (siRNA etc)
Why use a nucleotide sequence after all
Blast flavors
BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)
Query DNA Protein
DB DNA Protein
Uses of BLAST programs
BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db
DNA protein
If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search
tBLASTn
tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames
Protein DNA
You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest
tBLASTx
tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames
DNA DNA
(nr db cannot be used because itrsquos too large)
Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)
E-value
- Slide 23
- Slide 24
- Slide 25
-
No ORF found No similar protein sequences were found Specific DNA databases are available (EST)
To find duplicated genes in a genome
To find pseudogenes
To find the location of non-protein coding genes
in the genome (siRNA etc)
Why use a nucleotide sequence after all
Blast flavors
BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)
Query DNA Protein
DB DNA Protein
Uses of BLAST programs
BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db
DNA protein
If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search
tBLASTn
tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames
Protein DNA
You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest
tBLASTx
tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames
DNA DNA
(nr db cannot be used because itrsquos too large)
Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)
E-value
- Slide 23
- Slide 24
- Slide 25
-
Blast flavors
BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)
Query DNA Protein
DB DNA Protein
Uses of BLAST programs
BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db
DNA protein
If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search
tBLASTn
tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames
Protein DNA
You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest
tBLASTx
tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames
DNA DNA
(nr db cannot be used because itrsquos too large)
Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)
E-value
- Slide 23
- Slide 24
- Slide 25
-
Uses of BLAST programs
BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db
DNA protein
If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search
tBLASTn
tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames
Protein DNA
You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest
tBLASTx
tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames
DNA DNA
(nr db cannot be used because itrsquos too large)
Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)
E-value
- Slide 23
- Slide 24
- Slide 25
-
tBLASTn
tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames
Protein DNA
You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest
tBLASTx
tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames
DNA DNA
(nr db cannot be used because itrsquos too large)
Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)
E-value
- Slide 23
- Slide 24
- Slide 25
-
tBLASTx
tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames
DNA DNA
(nr db cannot be used because itrsquos too large)
Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)
E-value
- Slide 23
- Slide 24
- Slide 25
-