蛋白質體學及應用 - nsysu.edu.t · 實驗技術. 何謂蛋白質體學 蛋白質體學與傳統生化學 傳統研究“蛋白質”的科學 生物化學 酵素 學 … 等.
整合式基因體與蛋白體 資料庫
description
Transcript of 整合式基因體與蛋白體 資料庫
劉 志 俊 (Chih-Chin Liu)
中華大學 資訊工程系
July 2008
整合式基因體與蛋白體資料庫
整合式基因體與蛋白體資料庫
Assistant Prof. Chih-Chin Liu Page 2
Outline
生物資訊 (Bioinformatics): 資料庫觀點 生物資訊四大資料型態 (Data Types)
生物資料庫設計與 UML
整合式生物資料庫 : UniBio
豬 /土雞基因體資料庫 蛋白體資料庫
Assistant Prof. Chih-Chin Liu Page 3
當生物遇見資訊
生物學分子遺傳學分子生物學生物化學細胞生物學蛋白質學免疫學
資訊學程式語言資料結構演算法資料庫平行處理資料探勘
生物資訊
Assistant Prof. Chih-Chin Liu Page 4
基因體、轉錄體、蛋白體、代謝體
基因體 (Genome): 轉錄體 (Transcriptome): The complement of expressed
gene that are found in a particular cell or tissue. 蛋白體 (Proteome): The complement of proteins that are
found in a particular cell or tissue. 代謝體 (Metabolome): The assembly of substrates,
metabolites, and other small molecules that are present in
a population of cells.
Assistant Prof. Chih-Chin Liu Page 5
更多的【體】
結構體 (∑ Structures, Structurome) 變異體 (∑ SNPs, SNPome) 文獻體 (∑ Literatures, Literaturome) 訊號傳導體 (∑ Transductions,
Transductome) 反應路徑體 (∑ Pathways, Pathwayome) 遺傳疾病體 (∑ Diseases, Diseasome)
體 資料庫體 資料庫
Assistant Prof. Chih-Chin Liu Page 6
Research Issues in Biological Databases
Data Modeling How to store/represent biological data
Data Retrieval How to retrieve similar biological objects
Data Mining How to find rules behind biological data
Simulation Pathway Simulation, Virtual Cell, Virtual Life
Assistant Prof. Chih-Chin Liu Page 7
New Data Types in Bio-Databases
Large Strings DNA Sequences, Protein Sequences
Biological Images 2D Gels, Microarray Images
3D Structures Proteins, Compounds
Network Pathways
Assistant Prof. Chih-Chin Liu Page 8
New Data Types in Bio-Databases
Large Strings: DNA Sequences
現代人第 1 號染色體的完整序列 , 長度為
245,564,334 bp是 GenBank 最長的一筆序列紀錄
Assistant Prof. Chih-Chin Liu Page 9
New Data Types in Bio-Databases
Large Strings: Protein Sequences
PIR: I38344
PIR 資料庫最長的蛋白質序列
26,926 個氨基酸 titin, cardiac muscle
[validated] - human
Assistant Prof. Chih-Chin Liu Page 10
New Data Types in Bio-Databases
Images: Microarray (Stanford Microarray Database)
Assistant Prof. Chih-Chin Liu Page 11
New Data Types in Bio-Databases
Images: 1D-Gel, 2D-Gel
Assistant Prof. Chih-Chin Liu Page 12
New Data Types in Bio-Databases
3D Structures: Chemical Compound
Assistant Prof. Chih-Chin Liu Page 13
New Data Types in Bio-Databases
3D Structures
Assistant Prof. Chih-Chin Liu Page 14
New Data Types in Bio-Databases
3D Structures
ATOM 1 N VAL 1 -
4.004 15.224 13.636 1.00 32.64
N
ANISOU 1 N VAL 1 4512
3449 4441 -335 -2675 320
N
ATOM 2 CA VAL 1 -
3.526 15.758 14.900 1.00 18.42
C
ANISOU 2 CA VAL 1 1478
2233 3289 -286 -467 555
C
ATOM 3 C VAL 1 -
2.662 14.733 15.628 1.00 17.06
C
ANISOU 3 C VAL 1 1603
1981 2899 -152 -466 234
C
ATOM 4 O VAL 1 -
3.053 13.569 15.714 1.00 18.61
O
ANISOU 4 O VAL 1 1758
2150 3163 -489 -394 501
O
ATOM 1 N VAL 1 -
4.004 15.224 13.636 1.00 32.64
N
ANISOU 1 N VAL 1 4512
3449 4441 -335 -2675 320
N
ATOM 2 CA VAL 1 -
3.526 15.758 14.900 1.00 18.42
C
ANISOU 2 CA VAL 1 1478
2233 3289 -286 -467 555
C
ATOM 3 C VAL 1 -
2.662 14.733 15.628 1.00 17.06
C
ANISOU 3 C VAL 1 1603
1981 2899 -152 -466 234
C
ATOM 4 O VAL 1 -
3.053 13.569 15.714 1.00 18.61
O
ANISOU 4 O VAL 1 1758
2150 3163 -489 -394 501
O
Assistant Prof. Chih-Chin Liu Page 15
New Data Types in Bio-Databases
Network: Pathways
Assistant Prof. Chih-Chin Liu Page 16
Database Design
Conceptual Database Design Class Diagram (ER Model, UML Class Diagram)
Entities(Classes), Relationships, Attributes
Logical Database Design Relational Schema
Normalization, ER to Relational Data Model Mapping
Physical Database Design Implementation (e.g. Oracle, MySQL, SQL Server)
Indexes and Storage Methods
Assistant Prof. Chih-Chin Liu Page 17
The UniBio Project
完整性 收集所有生物相關之可下載資料庫
整合性 所有資料互相參考 , 邏輯上為單一資料庫
中文化 盡可能提供對應之中文資料 , 降低學習障礙
Assistant Prof. Chih-Chin Liu Page 18
The UniBio Project
下載原始格式生物資訊
調整生物資訊格式
生物資料庫
生物資料庫設計
生物資料庫建置
生物資訊網站
生物資訊網站
PerlMySQL
UML
phpMyAdmin
Assistant Prof. Chih-Chin Liu Page 19
The UniBio ProjectDeveloping Environment
RedHat Linux 9.0 (Free, 穩定 , 高效能 )
MySQL (Free, 跑的最快的資料庫 )
Apache (Free, 穩定 , 功能強大 , 高效能 )
Perl (Free, 生物資訊主要程式語言 , 程式精簡 ,跨平台 )
PHP (Free, 函數眾多 , 容易撰寫 , 跨平台 )
C/C++ (Free, 歷史悠久 , 功能強大 )
Java (Free, 可 Web顯示 , 跨平台 )
Assistant Prof. Chih-Chin Liu Page 20
The UniBio Projecthttp://140.126.11.172/
Assistant Prof. Chih-Chin Liu Page 21
Genome Data Management
SampleDatabase
Sampling Cloning Sequencing BLASTing Submitting
CloneDatabase
cDNADatabase
BLASTReport
Database
GenBankSubmission
Files
GenBankEMBLDDBJ
RefSeqTIGRTGI
UniGene
Assistant Prof. Chih-Chin Liu Page 22
Functional Genome Data Management
MicroArrayDatabase
GeneExpression
GeneExpression
Profile
in silicoSimulation
in situVerification
in vivoTesting
ProfileDatabase
SimulationResult
Database
VerificationReport
Database
New Drug $$$
EnzymeKEGGcDNADatabase
???
Assistant Prof. Chih-Chin Liu Page 23
豬 /土雞基因體資料庫
Assistant Prof. Chih-Chin Liu Page 24
豬 /土雞基因體資料庫
Assistant Prof. Chih-Chin Liu Page 25
豬 /土雞基因體資料庫
Assistant Prof. Chih-Chin Liu Page 26
豬 /土雞基因體資料庫
Assistant Prof. Chih-Chin Liu Page 27
豬 /土雞基因體資料庫BLAST Results (GenBank)
Assistant Prof. Chih-Chin Liu Page 28
豬 /土雞基因體資料庫dbEST Submission
TYPE: ESTSTATUS: NewCONT_NAME: Wen-Chuan LeeCITATION:Porcine testis EST projectLIBRARY: Porcine testis cDNA library IEST#: PDUts1001A02CLONE: PDUts1001A02SOURCE: Division of Biotechnology, Animal Technology Institute Taiwan...SEQ_PRIMER: T7 promoter primerHIQUAL_START: 1HIQUAL_STOP: 306DNA_TYPE: cDNAPUBLIC: 12/31/2005SEQUENCE:CTCAACCATTGATGGAGCATATTTCTCTATTTTTAGTAGATCTAGAAAAAAATAGTATGAAGTTAGATATCCTAAGAAGAGCAATTACCGCTATTTCATTATATTTTGCTTAAAAAAAAACAAGATTATTTTAATGGATATATCAAATCCTCGTGCACGATGTACAAAAATTAAAGCACGTCTGGGGCCACAAAGCACATCTCGATGAACTCTGAATAGATAGTACCAAGCAATTAGGTTATAAATTAATACTTTACAAGAGAATTTAGAAAATTTCATAGTTGCCCAGTGTAAGCTACCTTTCTA||
Assistant Prof. Chih-Chin Liu Page 29
Integrated Proteomic Database
SWISS-PROT
KEGG
PDBPIR
MIPS/JIPID CATH
SCOP
LIGAND
ENZYME BRENDA
PROSITE
PRINTS
BLOCKS
Pfam
EMOTIF
Dali/FSSP
BioCyc
WIT
Siena-2DPAGE
PMMA-2DPAGE
SWISS-2DPAGE
RESIDPlasma-2DPAGE
ATIT-2DPAGE
UniProt
MassSpec
Assistant Prof. Chih-Chin Liu Page 30
2D Gel Electrophoresis
Molecular Weight Markers
Separation by Charge (pI)
Se
para
tion
by
Mol
ecu
lar
We
igh
t (M
W)
Assistant Prof. Chih-Chin Liu Page 31
Exploring Diseases
Detect the spots that changed.Identify which proteins they are by PMF (Peptide Mass Fingerprinting)They could be candidates for drug screening.
Assistant Prof. Chih-Chin Liu Page 32
2D-PAGE Example2D123456_1.tif
Assistant Prof. Chih-Chin Liu Page 33
2D-PAGE Spot Examples2D123456_1.out
"SSP" "MR" "PI" "TA20040301PH4~7""" "" "" "quantity"0105 14.000000 0.940249 17718.580304 20.000000 0.100000 3015.930409 27.025288 2.881626 4703.690410 28.200542 3.015601 7963.920411 26.410089 3.035875 5168.190510 30.000000 0.100000 568.170610 45.000000 -1.000000 256.190708 70.379211 4.008969 12372.920709 60.177605 4.017597 60490.970710 71.341202 4.018401 20098.130711 68.146568 4.018714 25632.640712 57.148594 4.023514 73912.910713 66.000000 -1.000000 940.280902 116.400002 4.000000 160499.94
Assistant Prof. Chih-Chin Liu Page 34
Gel Database
A Gel UML Class Diagram for Modeling 2D-PAGE Images and Their SpotsDatabase: GelDBDate: 2004/03/05DBA: Chih-Chin Liu
SpotSSPMWPIQty
SampleSample_IDDescriptionDateQtyMethodPrepareSampleTypeSpeciesOrganTissueSexAgeGenotypePhenotype
GelGel_IDExpt_NoImageFileIPG_StrippH_LowpH_HighLinearpI_LowpI_HighMW_LowMW_HighComplexityProperty
1..n1 1..n1
electrophoresis
Assistant Prof. Chih-Chin Liu Page 35
MassSpec Database
Samples
MassSpec Analysis Results (.pkl)
Mascot Configuration
Mascot Query
Mascot Result (.dat)
Mascot Protein Reports
Mascot Peptide Reports
Assistant Prof. Chih-Chin Liu Page 36
MassSpec Sample
Assistant Prof. Chih-Chin Liu Page 37
MassSpec Instruments
Assistant Prof. Chih-Chin Liu Page 38
Mass Spectrum ExampleMIxxxxxx.pkl
Assistant Prof. Chih-Chin Liu Page 39
Mascot Query Example
Assistant Prof. Chih-Chin Liu Page 40
Mascot Search ResultFxxxxxx.dat
Assistant Prof. Chih-Chin Liu Page 41
MassSpec DatabaseA MassSpec UML Class Diagram for Modeling Mascot Search ResultsDatabase: MassSpecDate: 2003/12/20DBA: Chih-Chin Liu
PeakPeakMassPeakIntensity
MS_PeptideQueryRankPrettyRankMatchedMissedCleaveMrCalcDeltaObservedChargeMrExpIonsMatchedPeptideStrPeaksUsed1VarModsStrVarModsIonsScoreSeriesUsedPeaksUsed2PeaksUsed3PeptideIdThHomologyThProbOfPep
MS_ProteinAccessionDescriptionScoreMassFrameCoverageNumPeptides
MascotConfigFastaVerMascotVerMSParserVerDatabaseNumSeqsNumResidues
MascotQueryUserNameUserEmailTaxonomyFilterCleaveEnzymeMissedCleaveStaticModsICATPeptideTolPeptideTolUnitFragmentTolFragmentTolUnitChargeStateMassTypeTypeOfSearchPrecursorMassCTermMassNTermMass
11 11
config
2D_PAGE_Spot
MascotResultFileNameNumHitsExecTimeObservedMassObservedChargeObservedMrValueRepeatSearchString
hit
MassSpecResultFileNameFileTypeInstrument
0..*
1
0..*
1
associated_with
1..n1 1..n1
query
PeakListMassMinMassMaxIntMinIntMaxNumPeaks
1
1
1
1contain
Assistant Prof. Chih-Chin Liu Page 42
Flowchart
*.txt*.pkl
*.dat
MassSpecDatabase
MascotSearch(PMF)
MascotParser
Assistant Prof. Chih-Chin Liu Page 43
Proteome Data Management
*.tiff
GelDatabase
Sample 2D-PAGE SpotMass
Spectrum
Protein/PeptideReport
*.out *.pkl *.dat
upload upload upload/parsing
key-in
MassSpecDatabase
upload/parsing
Assistant Prof. Chih-Chin Liu Page 44
蛋白體資料庫
Assistant Prof. Chih-Chin Liu Page 45
蛋白體資料庫
Assistant Prof. Chih-Chin Liu Page 46
蛋白體資料庫
Assistant Prof. Chih-Chin Liu Page 47
蛋白體資料庫