Advanced Python Concepts: Modules
description
Transcript of Advanced Python Concepts: Modules
10/21/2013 BCHB524 - 2013 - Edwards
Advanced Python Concepts: Modules
BCHB5242013
Lecture 14
10/21/2013 BCHB524 - 2013 - Edwards 2
Modules
We've already used these from the python standard library and from BioPython
import sysfilename = sys.argv[1]file_handle = open(filename)
import Bio.SeqIOseq_records = Bio.SeqIO.parse(file_handle, "swiss")
import gzipfile_handle = gzip.open(filename)
import urllibfile_handle = urllib.urlopen(url)
import mathx = math.sqrt(y)
10/21/2013 BCHB524 - 2013 - Edwards 3
Modules
Modules contain previously defined functions, variables, and (soon!) classes.
Store useful code for re-use in many programs
Structurerelated codetogether
import sysimport MyNucStuff
seqfilename = sys.argv[1]
seq = MyNucStuff.read_seq_from_filename(seqfilename)cseq = MyNucStuff.complement(seq)rcseq = MyNucStuff.reverseComplement(seq)
print " Sequence:",seqprint " C sequence:",cseqprint "RC sequence:",rcseq
10/21/2013 BCHB524 - 2013 - Edwards 4
Modules
In MyNucStuff.pydef read_seq_from_filename(seq_filename): seq_file = open(seq_filename) dna_seq = ''.join(seq_file.read().split()) dna_seq = dna_seq.upper() seq_file.close() return dna_seq
compl_dict = {'A':'T', 'T':'A', 'C':'G', 'G':'C'}
def complement(seq): return ''.join(map(compl_dict.get,seq))
def revseq(seq): return ''.join(reversed(seq))
def reverseComplement(seq): return complement(revseq(seq)) # plus anything else you like...
10/21/2013 BCHB524 - 2013 - Edwards 5
Modules
We can import specific functions directly… And reference them without the module name
import sysfrom MyNucStuff import read_seq_from_filenamefrom MyNucStuff import complementfrom MyNucStuff import reverseComplement
seqfilename = sys.argv[1]
seq = read_seq_from_filename(seqfilename)cseq = complement(seq)rcseq = reverseComplement(seq)
print " Sequence:",seqprint " C sequence:",cseqprint "RC sequence:",rcseq
10/21/2013 BCHB524 - 2013 - Edwards 6
Modules
We can even import all the functions from a module…
import sysfrom MyNucStuff import *
seqfilename = sys.argv[1]
seq = read_seq_from_filename(seqfilename)cseq = complement(seq)rcseq = reverseComplement(seq)
print " Sequence:",seqprint " C sequence:",cseqprint "RC sequence:",rcseq
10/21/2013 BCHB524 - 2013 - Edwards 7
Packages
Packages are collections of modules, grouped together. All equivalent:
Implemented using files and folders/directories.
import Bio.SeqIOBio.SeqIO.parse(handle, "swiss")
from Bio import SeqIOSeqIO.parse(handle, "swiss")
from Bio.SeqIO import parseparse(handle, "swiss")
What can go wrong?
Sometimes our own .py files can "collide" with Python's packages.
Test what happens with an "empty" module in files: xml.py (and then try to import ElementTree) Bio.py (and then try to import SeqIO) etc…
10/21/2013 BCHB524 - 2013 - Edwards 8
10/21/2013 BCHB524 - 2013 - Edwards 9
A module for codon tables Module is called: codon_table Functions:
read_codons_from_filename(filename) returns dictionary of codons –
value is pair: (amino-acid symbol, initiation codon true/false) amino_acid(codon_table,codon)
returns amino-acid symbol for codon is_init(codon_table,codon)
returns true if codon is an initiation codon, false, otherwise get_ambig_aa (codon_table,codon)
Returns the single amino-acid consistent with ambiguous codon (containing N's), or X.
translate(codon_table,seq,frame) returns amino-acid sequence for DNA sequence seq
10/21/2013 BCHB524 - 2013 - Edwards 10
A module for codon tables
from MyNucStuff import *from codon_table import *import sys
if len(sys.argv) < 3: print "Require codon table and DNA sequence on command-line." sys.exit(1)
table = read_codons_from_filename(sys.argv[1])seq = read_seq_from_filename(sys.argv[2])
if is_init(table,seq[:3]): print "Initial codon is an initiation codon"
for frame in (1,2,3): print "Frame",frame,"(forward):",translate(table,seq,frame)
A module for codons
In codon_table.py:
10/21/2013 BCHB524 - 2013 - Edwards 11
def read_codons_from_filename(codonfile): # magic return codon_table
def amino_acid(table,codon): # magic return aa
def is_init(table,codon): # magic return init
def get_ambig_aa(table,codon): # magic return aa
def translate(table,seq,frame): # magic return aaseq
A module for codons
In codon_table.py:
10/21/2013 BCHB524 - 2013 - Edwards 12
def read_codons_from_filename(codonfile): f = open(codonfile) data = {} for l in f: sl = l.split() key = sl[0] value = sl[2] data[key] = value f.close()
b1 = data['Base1'] b2 = data['Base2'] b3 = data['Base3'] aa = data['AAs'] st = data['Starts']
codon_table = {} n = len(aa) for i in range(n): codon = b1[i] + b2[i] + b3[i] isInit = (st[i] == 'M') codon_table[codon] = (aa[i],isInit) return codon_table
10/21/2013 BCHB524 - 2013 - Edwards 13
Lab exercises Rework the lecture, and your solutions (or mine) from the
homework exercises #1 through #3, to make a MyDNAStuff module. Put as many useful nucleotide functions as possible into
the module...
Rework the lecture, and your solutions (or mine) from homework exercises #4 and #5, to make the codon_table module with functions specified in this lecture.
Demonstrate the use of these modules to translate an amino-acid sequence in all six-frames with just a few lines of code. The final result should look similar to Slide 10. Your program should handle DNA sequence with N’s in it.
Class Project: Expectations
40% of your grade!
Project Report Long version of your homework write-up
Project Presentation Demo your program Describe your project solution
10/21/2013 BCHB524 - 2013 - Edwards 14
10/21/2013 BCHB524 - 2013 - Edwards 15
Class Project: Blast Database
1. Write a program that computes all pairwise blast alignments for two species' proteomes and stores the alignments in a relational database.
2. Write a program that retrieves the blast alignment for two proteins (specified by their accessions) from the relational database.
3. Write a program that finds pairs of orthologous proteins that are mutually best hits in the species' proteomes.
10/21/2013 BCHB524 - 2013 - Edwards 16
Class Project: MS/MS Viewer
Write a program to display peptide fragmentation spectra from an mzXML file. The program will take an mzXML file, a scan
number, and a peptide sequence as input. The peptide's b-ion and y-ion m/z values should
be computed, and peaks matching these m/z values annotated with appropriate labels.
The output figure/plot should aid the user in determining whether or not the peptide is a good match to the spectrum.
10/21/2013 BCHB524 - 2013 - Edwards 17
Class Project: Protein Digest
Write a simple web-server application using TurboGears to carry out an in silico enzymatic digest of a user-provided protein sequence. Users should be able to specify min and max
length, min and max molecular weight, # of missed cleavages, and specific enzyme.
Output should be a table of peptides, with their length, molecular weight, # of missed cleavages, and amino-acids to left and right of each peptide in the protein sequence.