Download - An Epigenetics Odyssey @ PyCon Taiwan 2013

Transcript

Page 1: An Epigenetics Odyssey @ PyCon Taiwan 2013

AN EPIGENETICS ODYSSEYWen-Wei Liao

[email protected]

mailto:[email protected]

mailto:[email protected]

Page 2: An Epigenetics Odyssey @ PyCon Taiwan 2013

Wen-Wei Liao (廖玟崴)

Work in Pao-Yang Chen’s Lab, Academia Sinica (陳柏仰)

EducationM.S. in Dept. of Systems Neuroscience, NTHUB.S. in Dept. of Life Science, NTHU

Talks“Use the Matplotlib, Luke,” PyCon Taiwan 2012“Matplotlib for Python Programmers,” PyHUG

Page 3: An Epigenetics Odyssey @ PyCon Taiwan 2013

PyCon Taiwan

Our Lab

Page 4: An Epigenetics Odyssey @ PyCon Taiwan 2013

Who’s Paul Graham?

Page 5: An Epigenetics Odyssey @ PyCon Taiwan 2013

“So if you're a CS major and you want to start a startup, instead of taking a class on entrepreneurship you're better off taking a class on, say, genetics. Or better still, go work for a biotech company. CS majors normally get summer jobs at computer hardware or software companies. But if you want to find startup ideas, you might do better to get a summer job in some unrelated field.

ー How to Get Startup Ideas, Paul Graham

Page 6: An Epigenetics Odyssey @ PyCon Taiwan 2013

Human

Page 7: An Epigenetics Odyssey @ PyCon Taiwan 2013

19 20 21 22 X Y

1 2 3 4 5

6 7 8 9 10 11 12

13 14 15 16 17 18

Chromosomes

Page 8: An Epigenetics Odyssey @ PyCon Taiwan 2013

Double Helix

Page 9: An Epigenetics Odyssey @ PyCon Taiwan 2013

2003Human Genome Project

3 billion bases (Gb)30 億

Page 10: An Epigenetics Odyssey @ PyCon Taiwan 2013

Page 11: An Epigenetics Odyssey @ PyCon Taiwan 2013

Why are Identical Twins different?

Page 12: An Epigenetics Odyssey @ PyCon Taiwan 2013

...GATTACACCCATGTCAGTGCG...

...CTAATGTGGGTACAGTCACGC...

DNA Sequences

Page 13: An Epigenetics Odyssey @ PyCon Taiwan 2013

...GATTACACCCATGTCAGTGCG...

...CTAATGTGGGTACAGTCACGC...

DNA Sequences

Page 14: An Epigenetics Odyssey @ PyCon Taiwan 2013

...GATTACACCCATGTCAGTGCG...

...CTAATGTGGGTACAGTCACGC...

m m

DNA Sequences

Page 15: An Epigenetics Odyssey @ PyCon Taiwan 2013

...GATTACACCCATGTCAGTGCG...

...CTAATGTGGGTACAGTCACGC...

m m

DNA Sequences

m = methylation (甲基化)

Page 16: An Epigenetics Odyssey @ PyCon Taiwan 2013

Epigenetics (表觀遺傳學): Above Genetics

DNA Methylation (甲基化)• Regulation of gene expression• Environment effect• Heritable marks

Page 17: An Epigenetics Odyssey @ PyCon Taiwan 2013

“The new science of epigenetics reveals how the choices you make can change your genes and those of your kids.”

Page 18: An Epigenetics Odyssey @ PyCon Taiwan 2013

Next-generation Sequencing

30 Gb (300 億)

Sonication

Adaptor Ligation

PCR Amplification

Sequencing in Parallel

Page 19: An Epigenetics Odyssey @ PyCon Taiwan 2013

Next-generation Sequencing

30 Gb (300 億)

Sonication

Adaptor Ligation

PCR Amplification

Sequencing in Parallel

Bisulfite Conversion

Page 20: An Epigenetics Odyssey @ PyCon Taiwan 2013

Detecting Cytosine Methylation

m mGATTACACCCATGT

GATTACATCTATGT

sodium bisulfite亞硫酸鈉

1. Apply sodium bisulfite

amplify

2. C → T, methylated C

(and A/T/G)

unchanged

3. Align new sequence to

known reference and

compare

Page 21: An Epigenetics Odyssey @ PyCon Taiwan 2013

Analysis Workflow

Sequence Mapping

MethylationCalling

Statistics&

Plotting

BS Seeker* mapping of bisulfite-treated reads

Biopython*parse bioinformatics files into Python utilizable data structures

PyTables*manage hierarchical datasets (HDF5) and design to efficiently cope with extremely large amounts of data

PysamPython wrapper of SAMtools C-APIread and manipulate SAM files

rePython built-in module for regular expression

NumPy&

SciPy

matrix operations, statistics, clustering, and more

Matplotlib data visualization

Page 22: An Epigenetics Odyssey @ PyCon Taiwan 2013

Mapping Approach: BS SeekerBS reads are C/T converted, so normal aligners are NOT applicable

3 letters alignment algorithm:

Chen et al. (2010). BMC Bioinformatics.

Convert C to T Bowtie mapping Restore to 4 letters compare alignment

BS read: AATCGTA AATTGTA

AATTGTA AATCGTA TTAATTGTAGG CTAATCGCAGG

Ref.genome: CTAATCGCAGG TTAATTGTAGG

Page 23: An Epigenetics Odyssey @ PyCon Taiwan 2013

TAGTGCGTGGTG CATTTTAGTGCGTGG TTTTAGCGCGTGGTGRef. genome ATTGAGACATCCTAGCGCGTGGTGACAATAATA

Methylation levels at single-base resolution

• Estimate methylation level at each covered C• Methylation level = #C / (#C + #T)

Page 24: An Epigenetics Odyssey @ PyCon Taiwan 2013

TAGTGCGTGGTG CATTTTAGTGCGTGG TTTTAGCGCGTGGTGRef. genome ATTGAGACATCCTAGCGCGTGGTGACAATAATA

Methylation levels at single-base resolution

• Estimate methylation level at each covered C• Methylation level = #C / (#C + #T)

Page 25: An Epigenetics Odyssey @ PyCon Taiwan 2013

TAGTGCGTGGTG CATTTTAGTGCGTGG TTTTAGCGCGTGGTGRef. genome ATTGAGACATCCTAGCGCGTGGTGACAATAATA

Methylation levels at single-base resolution

• Estimate methylation level at each covered C• Methylation level = #C / (#C + #T)

1 / (1 + 2) = 33.3%

Page 26: An Epigenetics Odyssey @ PyCon Taiwan 2013

TAGTGCGTGGTG CATTTTAGTGCGTGG TTTTAGCGCGTGGTGRef. genome ATTGAGACATCCTAGCGCGTGGTGACAATAATA

Methylation levels at single-base resolution

• Estimate methylation level at each covered C• Methylation level = #C / (#C + #T)

1 / (1 + 2) = 33.3%

Page 27: An Epigenetics Odyssey @ PyCon Taiwan 2013

TAGTGCGTGGTG CATTTTAGTGCGTGG TTTTAGCGCGTGGTGRef. genome ATTGAGACATCCTAGCGCGTGGTGACAATAATA

Methylation levels at single-base resolution

• Estimate methylation level at each covered C• Methylation level = #C / (#C + #T)

1 / (1 + 2) = 33.3%

3 / (3 + 0) = 100%

Page 28: An Epigenetics Odyssey @ PyCon Taiwan 2013

Biopython

from Bio import SeqIO

# load all the records into memory at oncewith open('some.fasta') as infile: seq = SeqIO.to_dict(SeqIO.parse(infile, 'fasta'))

# indexing approach for very large file# provide dictionary-like access to any recordseq = SeqIO.index('some.fasta', 'fasta')

>chr1TTTAATTATCTCTGAAATTTAAACCCCCAAATCCAGGTAATAAAGCAAGGAAATGTCTTACAGCCCAACACTTGCCATCAATACTTTTTCGATGTTA...>chr2GGCTGCTCTATCCTTTTCTGCACATTTGAACTCCTCCGCTGTGGGCCATTCTCATTTGCTTTACTTCCTAGTCTGAATTCCATGGGAACTGCATTTA...

FASTA file

Page 29: An Epigenetics Odyssey @ PyCon Taiwan 2013

PyTables

chr1 C 10502 CHG CT 0.000 0 14chr1 G 10504 CHG CA 0.000 0 29chr1 C 10506 CHH CC 0.000 0 15chr1 C 10507 CHG CT 0.067 1 15chr1 G 10509 CHG CA 0.000 0 30chr1 G 10511 CHH CT 0.000 0 30chr1 G 10512 CHH CC 0.000 0 29chr1 G 10514 CHH CT 0.000 0 28chr1 C 10517 CHG CT 0.000 0 15chr1 G 10519 CHG CA 0.000 0 21chr1 G 10521 CHH CA 0.045 1 22

. . . . . . . . . . . . . . . . . . . . . . . .

chrY G 63410 CHH CT 0.000 0 18

CGmap file

Page 30: An Epigenetics Odyssey @ PyCon Taiwan 2013

PyTables

root

group

tables

/

. . . . . .

cgmap

chr1chr2chrY

• Hierarchical Data Format (HDF5)

Page 31: An Epigenetics Odyssey @ PyCon Taiwan 2013

PyTables

import tables

class MethylSite(tables.IsDescription): strand = tables.StringCol(1) position = tables.Int64Col() context = tables.StringCol(3) dint = tables.StringCol(2) level = tables.Float32Col() mdepth = tables.Int32Col() depth = tables.Int32Col()

Page 32: An Epigenetics Odyssey @ PyCon Taiwan 2013

with open(args.cgmap) as cgmap: root = os.path.splitext(os.path.basename(args.cgmap))[0] with tables.openFile('{0}.h5'.format(root), 'w', root) as h5file: group = h5file.createGroup('/', 'cgmap') for line in cgmap: line = line.strip().split() try: table = group._f_getChild(line[0]) except tables.NoSuchNodeError: table = h5file.createTable(group, line[0], MethylSite) methylsite = table.row methylsite['strand'] = line[1] methylsite['position'] = int(line[2]) methylsite['context'] = line[3] methylsite['level'] = float(line[5]) methylsite['mdepth'] = int(line[6]) methylsite['depth'] = int(line[7]) methylsite.append() table.flush()

PyTables

Page 33: An Epigenetics Odyssey @ PyCon Taiwan 2013

with open(args.cgmap) as cgmap: root = os.path.splitext(os.path.basename(args.cgmap))[0] with tables.openFile('{0}.h5'.format(root), 'w', root) as h5file: group = h5file.createGroup('/', 'cgmap') for line in cgmap: line = line.strip().split() try: table = group._f_getChild(line[0]) except tables.NoSuchNodeError: table = h5file.createTable(group, line[0], MethylSite) methylsite = table.row methylsite['strand'] = line[1] methylsite['position'] = int(line[2]) methylsite['context'] = line[3] methylsite['level'] = float(line[5]) methylsite['mdepth'] = int(line[6]) methylsite['depth'] = int(line[7]) methylsite.append() table.flush()

PyTables

Page 34: An Epigenetics Odyssey @ PyCon Taiwan 2013

with open(args.cgmap) as cgmap: root = os.path.splitext(os.path.basename(args.cgmap))[0] with tables.openFile('{0}.h5'.format(root), 'w', root) as h5file: group = h5file.createGroup('/', 'cgmap') for line in cgmap: line = line.strip().split() try: table = group._f_getChild(line[0]) except tables.NoSuchNodeError: table = h5file.createTable(group, line[0], MethylSite) methylsite = table.row methylsite['strand'] = line[1] methylsite['position'] = int(line[2]) methylsite['context'] = line[3] methylsite['level'] = float(line[5]) methylsite['mdepth'] = int(line[6]) methylsite['depth'] = int(line[7]) methylsite.append() table.flush()

PyTables

Page 35: An Epigenetics Odyssey @ PyCon Taiwan 2013

with open(args.cgmap) as cgmap: root = os.path.splitext(os.path.basename(args.cgmap))[0] with tables.openFile('{0}.h5'.format(root), 'w', root) as h5file: group = h5file.createGroup('/', 'cgmap') for line in cgmap: line = line.strip().split() try: table = group._f_getChild(line[0]) except tables.NoSuchNodeError: table = h5file.createTable(group, line[0], MethylSite) methylsite = table.row methylsite['strand'] = line[1] methylsite['position'] = int(line[2]) methylsite['context'] = line[3] methylsite['level'] = float(line[5]) methylsite['mdepth'] = int(line[6]) methylsite['depth'] = int(line[7]) methylsite.append() table.flush()

PyTables

Page 36: An Epigenetics Odyssey @ PyCon Taiwan 2013

with open(args.cgmap) as cgmap: root = os.path.splitext(os.path.basename(args.cgmap))[0] with tables.openFile('{0}.h5'.format(root), 'w', root) as h5file: group = h5file.createGroup('/', 'cgmap') for line in cgmap: line = line.strip().split() try: table = group._f_getChild(line[0]) except tables.NoSuchNodeError: table = h5file.createTable(group, line[0], MethylSite) methylsite = table.row methylsite['strand'] = line[1] methylsite['position'] = int(line[2]) methylsite['context'] = line[3] methylsite['level'] = float(line[5]) methylsite['mdepth'] = int(line[6]) methylsite['depth'] = int(line[7]) methylsite.append() table.flush()

PyTables

Page 37: An Epigenetics Odyssey @ PyCon Taiwan 2013

with open(args.cgmap) as cgmap: root = os.path.splitext(os.path.basename(args.cgmap))[0] with tables.openFile('{0}.h5'.format(root), 'w', root) as h5file: group = h5file.createGroup('/', 'cgmap') for line in cgmap: line = line.strip().split() try: table = group._f_getChild(line[0]) except tables.NoSuchNodeError: table = h5file.createTable(group, line[0], MethylSite) methylsite = table.row methylsite['strand'] = line[1] methylsite['position'] = int(line[2]) methylsite['context'] = line[3] methylsite['level'] = float(line[5]) methylsite['mdepth'] = int(line[6]) methylsite['depth'] = int(line[7]) methylsite.append() table.flush()

PyTables

Page 38: An Epigenetics Odyssey @ PyCon Taiwan 2013

PyTables

with tables.openFile(args.h5file, 'a') as h5file: # indexed mode. Indexing is just a kind of sorting operation # over a column, so that searches along such a column will look # at this sorted information by using a binary search table = h5file.root.cgmap.chr1 table.cols.position.createIndex() table.cols.level.createIndex() table.cols.depth.createIndex()

# in-kernel mode, the condition is passed to the PyTables kernel, # written in C, and evaluated there at full C speed condition = "(depth >= 4) & (level >= 0.05)" res = [row['position'] for row in table.where(condition)]

condition = "(position > 1000) & (position < 5000)" res = [row['position'] for row in table.where(condition)]

Page 39: An Epigenetics Odyssey @ PyCon Taiwan 2013

“So if you're a CS major and you want to start a startup, instead of taking a class on entrepreneurship you're better off taking a class on, say, genetics. Or better still, go work for a biotech company. CS majors normally get summer jobs at computer hardware or software companies. But if you want to find startup ideas, you might do better to get a summer job in some unrelated field.

ー How to Get Startup Ideas, Paul Graham

Page 40: An Epigenetics Odyssey @ PyCon Taiwan 2013

join us :)

Top Related

Epigenetics brochure

Epigenetics brochure

علم الوراثة اللاجيني Epigenetics

علم الوراثة اللاجيني Epigenetics

2004 National AFVDay Odyssey · The Odyssey Continues 1 2004 National AFV Day Odyssey, The Odyssey Continues… National AFV Day Odyssey is the spark of an idea that ignited a vision

2004 National AFVDay Odyssey · The Odyssey Continues 1 2004 National AFV Day Odyssey, The Odyssey Continues… National AFV Day Odyssey is the spark of an idea that ignited a vision

Pycon APAC 2013 closing

Pycon APAC 2013 closing

Introduction of PyCon APAC

Introduction of PyCon APAC

PyCon 2015 - Python Epiphanies

PyCon 2015 - Python Epiphanies

Languages

Pages

Legal

Copyright © 2022 FDOCUMENT