Platform Day

65
Platform Day 2008 홍창범 국립보건연구원 유전체센터 2008. 5. 30 광우병과 대용량 데이터 처리 플랫폼

Transcript of Platform Day

Page 1: Platform Day

PlatformDay2008

홍창범국립보건연구원유전체센터

2008.5.30

광우병과대용량데이터처리플랫폼

Page 2: Platform Day

전문가가되어버리다

Page 3: Platform Day
Page 4: Platform Day

M/MMethionine=Met,Valine=Val

Page 5: Platform Day

영국인은40%한국사람은95%

Page 6: Platform Day

gtg/atg

Page 7: Platform Day

dbSNP

Page 8: Platform Day

단지하나차이일뿐인데...

Page 9: Platform Day

단일염기다형성(SNP,SingleNucleotide

Polymorphism)

Page 10: Platform Day

모두23쌍의염색체를지님인간은누구나서로99.9%의유전정보가동일

Page 11: Platform Day

한인구집단의유전체에서1%이상의빈도를보이는염기서열의변이

대머리의56%가가족력존재

22%가유당분해효소결핍증44%가귓볼존재 알츠하이머의19%가가족력존재

Page 12: Platform Day

30억염기쌍이라는거대한책의색인을작성

Page 13: Platform Day

인간유전체연구

Page 14: Platform Day

여자3명,남자2명의DNA샘플을시작으로휴먼지놈프로젝트

시작

Page 15: Platform Day

영국,미국,캐나다,일본,나이지리아및중국에서약200여과학자들

아시아,아프리카및미국에서약269명의DNA표본을이용

Page 16: Platform Day

1000명유전체프로젝트(1000GenomesProject)

Page 17: Platform Day

한국인만명에대한...

Page 18: Platform Day

2003년휴먼지놈프로젝트$27억2004년크레이그벤터박사$1억

2008년왓슨박사$100만5년후$100

-MITTechnologyReview2008년4월17일뉴스-

30억염기서열쌍이라는대용량데이터생산과분석이현실다른오믹스(단백체,발현체,대사체등)정보와의통합분석

을구현할시점

Page 19: Platform Day

점점커져가는데...

Page 20: Platform Day

대량의연구를빠른시간에...

SequencingMicroarrays

HighthroughputGenotypingPowerfulComputation

LabAutomation

Page 21: Platform Day

생물학데이터에대해서

Page 22: Platform Day

자료구조의관점

1차원배열형태의서열데이터다차원배열형태의구조데이터매트릭스형태의발현데이터네트워크형태의상호작용데이터문서형태의텍스트데이터

Page 23: Platform Day

서열데이터

DNA,RNA등의유전체데이터EST서열

SNP데이터

Page 24: Platform Day

구조데이터

단백질3차구조데이터질량분석기데이터

Page 25: Platform Day

발현데이터

마이크로어레이데이터arrayCGHChIP-chip

Page 26: Platform Day

네트워크형태

pathwayproteininteraction

Page 27: Platform Day

텍스트데이터

논문,문헌정보

Page 28: Platform Day

유전체연구

Page 29: Platform Day

DatabaseHardware

Agent

Machine LearningAlgorithm

InformationRetrieval

IT와유전체연구

Genomic Variation Research

Personalized Medicine

GeneBankSWISS-PROT

Super ComputerCluster

ClusteringPattern recognition

Classfication

Sequence alignment

Biomedical text analysis

Information filteringMonitoring agent

Personal Genome

Page 30: Platform Day

SequenceAlignment-SimulatedAnnealing-GeneticAlgorithms

StructureandFunctionPrediction-HiddenMarkovModels-MultilayerPerceptrons

-DecisionTreesMolecularClusteringand

Classification-SupportVectorMachines

-NearestNeighborAlgorithmsExpressionAnalysis

-Self-OrganizingMaps-BayesianNetworks

MachineLearning

TobySegaranBiotechsoftwarecompany

Page 31: Platform Day

SequenceSearchwithMPI

Page 32: Platform Day

GPU를이용한병렬서열정렬

Page 33: Platform Day

웹서비스를이용한생물학매쉬업(Taverna,Myexperiment)

Page 34: Platform Day

컴퓨팅파워제공(Folding@Home,Korea@Home)

Page 35: Platform Day

집단지성을이용한퍼즐맞추기-foldit

Page 36: Platform Day

시맨틱웹(FreeBase)

Page 37: Platform Day

유전체연구플랫폼

50만개의SNPPorbe를포함하고있는AffymetrixGenome-Wide

HumanSNPAssay5.0고밀도SNParray

Page 38: Platform Day

MultidimensionalScaling(MDS)북부및서유럽(CEU),나이지라아계열(YRI),일본계(JPT),중국한족(CHB)

Page 39: Platform Day

한국인SNP

2,978개의유전자에대한12,995개의SNP

KARE로생산된9,603명(Genotype데이터18GB)

후속13,000명생산중

Page 40: Platform Day

임상,역학데이터

12개질환군별센터(심혈관,뇌신경질환,당뇨,피부,불임,선

천성기형등등)안산(대도시),안성(농촌)코호트등등

Page 41: Platform Day

질병관련연관성연구

정상,환자의시료를이용한표현형(phenotype)에연관된SNP발굴

질환가계및질환형제자매를이용한연구

Page 42: Platform Day

역학정보변수

Page 43: Platform Day

KSNP(KoreanSNP)Browser

Page 44: Platform Day

질병예측

Page 45: Platform Day

GenomeBrowserwithGoogleMapsAPI

Page 46: Platform Day

BioBlogRSS:전세계의생물학블로거

Page 47: Platform Day

모든것은바로컴퓨터의힘!

SimulationDataAnalysis

DataAcquisitionDataManagement

DataArchiving->ScientificResults,

Publication

Page 48: Platform Day

유전체센터의컴퓨팅파워

18NodeIA2Cluster 32NodeX86Cluster

100NodePPCCluster

NAS,DASStorageWorkstation

pc

Page 49: Platform Day

해결해야할문제점

Page 50: Platform Day

단일플랫폼

Capacity&Size증가데이터생산의가속화

데이터부족

Page 51: Platform Day

다중플랫폼

비용문제지리적문제

협업을통한시너지창출컴퓨팅리소스

누구도접근해보지못한

Page 52: Platform Day

...ATTAGGACCAATAAGTCT...

...ATTAGGAGCAATAAGTCT...

...ATTAGGAGCAATAACTCT...

...ATTAGGAGCAATAAGTCT...

Person 1 +

Person 2 -

Single locus Pair of locii

(+,-)Cost = 0.5M x 5k x 10k

(+,-)Cost = 1/2 x 0.5M x 0.5M x 5k x 10k

0.5M SNP, cohort of 5K individuals, 10k random data set

1 day 120 yr

Computationalproblem

Text

Page 53: Platform Day

기상청

지역별관측소자료+작년도오늘날씨=

내일비올확률40%

Page 54: Platform Day

RealisticSolution

머리카락+몸무게+키+음주습관=

다음주당뇨

Page 55: Platform Day

구글과생명공학의만남

Page 56: Platform Day

GoogleHealthBETA

Page 57: Platform Day

SNP정보

Page 58: Platform Day

유전자정보

Page 59: Platform Day

나는누구인가?

Page 60: Platform Day

얼마나유사한가?

Page 61: Platform Day

원하는것은?

Page 62: Platform Day

Google검색엔진->Google플랫폼Nutch검색엔진->Hadoop플랫폼

분석도구의손쉬운접근새로운알고리즘의개발과적용대용량의데이터의접근

->생물학데이터처리를위한플랫폼

Page 63: Platform Day

플랫폼그이상의플랫폼

모든연구자들이아무런댓가없이자신의연구를수행하고

서로의의견을교류할수있는그런...

Page 65: Platform Day