!# 11/5/2011 ChEMBL$%&’%()*+,...

Post on 30-Jul-2020

1 views 0 download

Transcript of !# 11/5/2011 ChEMBL$%&’%()*+,...

CBI!"# 11/5/2011

ChEMBL$%&'%()*+,

-./0123&%456)78

池田 和由 ChEMBL

kaz@ebi.ac.uk

Topics / 9:)6;5<

•! What is ChEMBL? / =>?@,AB

•! Contents / $%&CD,EFGH

•! Other ChEMBL resources /IJ)K)LM%(

•! Questions / NO

Who are we? / PQB

•! ChEMBLR=>?@SATUV=>?L5WXY)EMBL-EBI

ZT2008[Z\]^_`abcZdef^g%h

•! ijRklmn(oSpEFfqr^$%&'%(stu@vh6l

(6wx)yzR£4.7MSZ{|qEBI}~��Q^)p���

•! L%�%AJohn OveringtonT�>�%A��13�

Genomes

Ensembl

Ensembl Genomes

EGA

Nucleotide sequence

ENA

Functional

genomics

ArrayExpress

Expression Atlas

Protein Sequences UniProt

Protein families,

motifs and domains

InterPro

Macromolecular PDBe

Protein activity

IntAct , PRIDE

Pathways

Reactome

Systems

BioModels

BioSamples

Literature and ontologies

CiteXplore, GO

Chemogenomics

ChEMBL

•! ChEMBL database

•! Curation •! Interface

•! Research group •! IMI eTox

•! Industrial collaborations

Chemical

entities

ChEBI

Research & Databases at the EBI

What is ChEMBLdb? / =>?@,AB

•! -./23EFe��Rdrug-like cmpdsS)$%&'%(

•! ��\�8��t>�%���

•! e��)�����, ��m��

•! SARbc

•! &%456R&>m<N��Sbc

•! �� 30[¡)MedChem¢£

•! PubChem��)Y¤$%&

•! FDA¥¦.

•! §¨©ª�«¬RHTTPSS

ChEMBL­>&%®¯%(R:9°S

=>?@Wikipedia

Drug Discovery Process

-./EF��§(

> 690万件 生物活性(bioactivities)

> 110万個 化合物(compounds)

> 8千個 ターゲット(targets)

~12,000 candidates ~2000

承認薬

(drugs)

Target

Discovery

Lead

Discovery

Lead

Optimisation

Preclinical

Development Phase 1

Phase 2

Phase 3

Launch

•!Target

identification

•!Microarray

profiling

•!Target

validation

•!Assay

development

•!Biochemistry

•!Clinical/Animal

disease models

•!High-

throughput

Screening (HTS)

•!Fragment-based

screening

•!Focused

libraries

•!Screening

collection

•!Medicinal

Chemistry

•!Structure-based

drug design

•!Selectivity screens

•!ADMET screens

•!Cellular/Animal

disease models

•!Pharmacokinetics

•!Toxicology

•!In vivo safety

pharmacology

•!Formulation

•!Dose prediction PK

tolerability Efficacy

Safety

&

Efficacy

Indication

Discovery &

expansion

Med. Chem. SAR Clinical Candidates Drugs

Discovery Development Use

Clinical Trials

ChEMBL$%&'%(

SAR Data

Compound

Assa

y

Ki=4.5

nM

>Thrombin MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLERECVEETCS

YEEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGTNYRGHVNITRSGIECQLW

RSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYTTDPTVRRQECSIPVCGQDQVTVAMTPR

SEGSSVNLSPPLEQCVPDRGQQYQGRLAVTTHGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRN

PDGDEEGVWCYVAGKPGDFGYCDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFG

SGEADCGLRPLFEKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLI

SDRWVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWRENLDRD

IALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTANVGKGQPSVLQVVNL

PIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGGPFVMKSPFNNRWYQMGIVSWGEGC

DRDGKYGFYTHVFRLKKWIQKVIDQFGE

APTT 11

min

Target

&%456

Compound

e��

Bioactivity

±�²³

What is ChEMBL data? / =>?@$%&,AB

Extraction and Curation / ´µ,¨©¶%·¸>

•! MedChem¹Wº%»@)¼½¢£¾¿ManuallyZ´µRY

¤ÀÁS

-! J Med Chem, Bioorg Med Chem Lett, J Nat Prod etc

•! ­>Ât(\T¨©¶%&%RChemistry,Biology)JQÃQ

)ÄÅÆSZ{Çg¯5<

-! Chemical: Incorrect structures, Duplications, Missing salts etc

-! Biogical: Normalise gene names, Assign Uniprot IDs, Target classification, Confidence scores

•! ÈÉÊ�ª5�$%6R3, 4ËÌZÍÎS

•! PubChemとのデータ共有

-! Confirmatory (e��%&>m<N)Single Interaction Value))Ï

sÈÉÊZ­>Ð%6Ñ

-! PubChemもChEMBLの文献情報を取り込んでいる。

-!化合物のオーバーラップは、2%以下。

•! Neglected Disease Dataset(主にマラリアデータ)

-! GSK & Novartis Malaria Screening Data

-! Drugs for Neglected Diseases Initiative (DNDi) etc

External Data Import / Ò£Óbc

ChEMBL

Literature PubChem

Assays 667,868 410,112

10,909 化合物

FJ Gamo et al. Nature 465(7296) 305-310 (2010)

ChEMBL Schema / (¨%Ô

e��

-./

&%456

ÕÖRª5§­S

£Ó

Compounds / e��

•! e��)��bcATMolfileÑ

•! e��)×ØATStandard InChIsÙÚ

R(�¶Û=Ü(6L%sÝÞS

•! ��m��)ßàRMW, PSA, logP, Ro5,

Med_Chem_Friendly etc)

•! MolregnoATInternal�e��ID

•! CHEMBL_IDATDBÛ®�·º@)

IDÑCompounds, Assays, TargetsáqZ

ârqãä%<�IDÑ

-! prefixの”CHEMBL”が数字の前につく

-! 例:CHEMBL123 (molregno=22942,

chebi_id=122942)

•! åæçPpT&>m<NTèéêëìRíZ

-./T��bc�fS

e�� ��m��!

e�� ��!

e��)îï

bc

Targets / &%456RðÊS

•! íZ&>m<NÑKZñòTCell-Line��Ñ

•! ðÊ&>m<N)×ØATUniprot AccessionsÙ

ÚÑ�¿ZóôÊZ`õÑ -! e.g. Enzyme > Kinase > Protein Kinase > TK > EGFR

•! &%456)ªö­>p÷frøê

-! Compound known only to bind to receptor family

-! e.g. activity reported vs. ‘Muscarinic

receptors’

-! Compound binds to multi-complex

-! e.g. Ion channel

ðÊ`a)<l(

ðÊ`a)îïbc

Protein DNA

Organism Cell Line

Experiments / ÕÖ

•! ª5§­ZAT&%456,e��)

ù�súÈf^BindingûYZT

Functional,ADMETpüÇÑ

•! ²³bc)u>�Э>6AýþÑ

-! IC50 (half maximal inhibitory

concentration)

-! Ki (binding affinity)

-! MIC (minimum inhibitory concentration)

-! % Inhibition (of activity)

•! Rÿ!q\A�rpS²³")ð#

es$�|qrÇ

-! Standard Values, Units and Types

ª5§­!

²³bc!

ª5§­,&%456!

ð#e�Q^

²³"(nM)!

Functional Assays)%õ

Whole organism assays

(e.g., anti-infectives/parasitics)

Disease-derived cell-line

(e.g., human ovarian cancer cell line cytotoxicity)

Tissue or cell-based disease model

(e.g., glucose uptake by adipocytes)

Tissue or cell-based assay for target effect

(e.g., contraction of guinea-pig ileum)

Cell-based assay over-expressing target

(e.g., GPCR calcium mobilisation)

Targ

et a

ssocia

tion

Dis

ease a

ssocia

tion

14

疾患

標的分子

Marketed Drugs / -./bc

Drug class

Small molecule,

peptide, antibody

etc.

Rule of Five

compliant

First-in-class?

Oral

Delivery? Parenteral

Delivery?

Topical

delivery?

Single Enantiomer? Prodrug?

Boxed

warning?

•! FDA¥¦.)bcsOrange Book¾¿´µ

•! &'-./bcRRecent Drug ApprovalsSA?�(\)*+'

Web Services / t¯?ö%,(

•! REST APIZ{Ç��(lh¾¿)ª<§(p��

•! Compound (similarity & substructure), Target, Bioactivity Search

•! JAVA, Perl, Python)ö>�@-%�s./

•! XML, JSON)ªt6�5601söÐ%6

•! 例:CHEMBL_IDによる化合物検索

•! https://www.ebi.ac.uk/chemblws/compounds/CHEMBL1

ChEMBL for Drug Discovery

Physchem Property Space and Affinity ケミカルスペースとアフィニティ

!"#$%&'()*()+&*,-,&./%%0&123!24)*5&

分子量

ALogP

RO5

Drugs

分子量

<300 300-400 400-500 500-600 >600

Good

Bad

アフィニティ

- Potency generally

increases with MWT

Physchem Property Space and ADMET =Üv@(2%(,.�34

`a5

ALogP

6,-&'(2,7,(8,'(8(-9&.&:;<0&

`a5

<300 300-400 400-500 500-600 >600

Good

Bad

バイオアベイラビリティ

RO5

Drugs

- High MWT tends to lead to

poor ADMET properties

QED Drug-likeness Trend

Drug-likeness Trends / プロパティトレンド

£Óe��)67ATRO5

��m��s89Ñ

&:;<[\A6¶>�A=>�Qqr�r

Year

% o

f co

mp

ou

nd

s in

Ye

ar

リピンスキーRO5

Rule of Five Trend

Bickerton et al. 2012

Comparison of Dataset / $%&§56)?@

21

Media

n P

ropert

y

*chembl_13 data from literature

*cmpds with MWT >1000 not included

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

ChEMBL_13 PubChem Drug Drug>1999

MW

ALOGP

AROM

418

369 353

401

Ligand Efficiency,ª®�ä��

22

C. Abad-Zaptero DDT, 2005,10,464-469

BE

I

SEI

BEIA

pXC50*1000/MWT

SEIA pXC50*100/PSA

•! Better to optimise compounds with highest efficiency not highest potency

•! Puts compounds with different MWTs on same scale

Colored by

Activity Value

Prevalent Rings in ChEMBL / �L�¶>6L>(B

23

%_<=1990 %_2001-2010

% - no. of times ring appears /total no of rings reported for that

year range

in pre 1990 top 12 but not in 2001-2010 top 12

in 2001-2010 top 12 but not in pre 1990 top 12

Very few rings in the majority of bioactive molecules

similar to P Ertl J Med Chem 2006, 49,4568-4573

How to search in ChEMBL

Searching ChEMBLdb

•! Identifying Compounds interacting with Specific

Targets

-! Text search for protein names/synonyms

-! Browse protein or organism tree

-! Sequence search using BLAST

•! Compound Searching

-! Search by substructure or similarity

-! Search by compound name

-! Search by lists (smiles, names, IDs)

ChEMBLdb Interface

Browse Targets

ターゲットブラウザ

Research Results

Serotonin Receptor (セロトニン受容体)

Calc.

properties

Drug

Information

Clickable structure

Parent and Salt

Forms

Database links

Webinar / t¯,»%EC (5/30)

Other ChEMBL Resources

SARfari ChEMBL-NTD •!Kinase/GPCRのSAR情報DB •!熱帯病の医薬品候補化合物DB

Malaria Protein Family

Focused

DrugEBIlity •!創薬標的バリデーションDB

Druggability

eTOX •!トキシコロジー

0

5000

10000

15000

20000

25000

30000

35000

40000

1980 1985 1990 1995 2000 2005 2010

GPCR

Kinase

Protease

Ion Channel

Nuclear Receptor

Transporter

Year

Nu

mb

er

of

Bio

activitie

s

Protein Families ChEMBL Timeline / &­hl­>

Protein Families in ChEMBL / ®DÜL%

GPCR

33%

Kinase

12%

Protease

9%

Ion Channel

5%

% of Bioactivities in

ChEMBLdb

Kinase Protease Ion Channel GPCR Transcription

Factor

Top10 by Target Class

ChEMBLdb Contents (Targets)

% of Bioactivities

in ChEMBLdb

H3 CRTH2 P2Y12 EP3

Number of Bioactivities by Target (Top50)

Year: 2009 ~ Current

Number of Bioactivities by Compound (Top100)

Clozapine P2X(2) Antagonist

Year: 2009 ~ Current

Trends in GPCR Family

Kinase SARfari

Kinome Tree Target Browser

Kinase Domains & Compounds

Binding Site Similarity

3D Structure Analysis

ChEMBL DrugEBIlity Portal

Druggability,Tractabiliy

Ligand

EDrug-likenessF

Rule of Five

Drug-like

Target

EDruggabilityF

Druggable

Ligand

Protein

* Lipinski & Hopkins, Nature 2004

MW HBD HBA LogP RotB

Druggability 100<MW<550 ! 5 ! 10 ! 5 ! 10

Tractability 200<MW<800 ! 8 ! 15 ! 8 ! 16

=>2-?()&0(),5?5&

=>2-?()&=@25!@,-,5?5&&

A7+;&B>4++,'(8(-9& A7+;&B>4++,'(8(-9&

A7+;&B>4++,'(8(-9&

=>2-?,5?5&C>?D4?)19&

C>?D4?)19&

Druggability Result -Family-

Avg. Druggability

Transcription Factor

(# of domains per each protein > 3)

Druggability Result -Site Details-

Tyrosine protein kinase

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

56 60 64 68 72 76 80 84 88

Site1

Site4

Buried Surface Area [%]

Fre

qu

en

cy Small Drug Sites

All Data

A7+;&E)5?3'8?&F12>?&

A7+;&B>4++,'8?&F12>?&

B(@9*>2G28,-?&&

6?*41-,5?&HBIC6J&

#(>18?5K&0)2L)&B>4+&M,>+?-5&

=>2-?()&0(),5?&H!"##$J&

M>,)51>(!N2)&C,1-2>&H%#&&'(J&

O?3'>,)?&=>2-?()&H)#*J&

=>2-@>23'()&

P,8?1N)Q:&

Druggability Plots

Publication / ¢£

•! Bellis LJ, Akhtar R, Al-

Lazikani B, Atkinson F,

Bento AP, Chambers J,

Davies M, Gaulton A,

Hersey A, Ikeda K, Krüger

FA, Light Y, McGlinchey S,

Santos R, Stauch B,

Overington JP.

Biochem Soc Trans. 2011 Oct;39(5):1365-70.

•! A. Gaulton, L. Bellis, J. Chambers, M. Davies, A. Hersey, Y. Light, S.

McGlinchey, R. Akhtar, F. Atkinson, A.P. Bento, B. Al-Lazikani, D.

Michalovich, & J.P. Overington, NAR. 2011 Database Issue.

The ChEMBL-og- / &'bc

•! ChEMBL?�(

-! http://chembl.blogspot.co.uk

Webinar / t¯,»%

•! Û>l­>§Ü»%Rt¯,»%S

-! 16-May-2012 3:30pm Schema and sql querying

-! 30-May-2012 9:00am Interface and Searching (日本語)

-! 13-Jun-2012 3:30pm Interface and Searching

-! 27-Jun-2012 3:30pm Schema and sql querying

-! 11-Jul-2012 3:30pm Interface and Searching

-!http://chembl.blogspot.co.uk/2012/02/chembl-webinars-

for-2012.html

•! 5Ì30:RGSB:9*¡BHI5*{�

-! JChEMBL)­>&%®¯(,�8KZ9rq

-!LMNAOPR:9°S

-! ��^\êQR��

-!âOr�STABkaz@ebi.ac.uk

Help and Feedback / âOr�ST

•! chembl-help@ebi.ac.uk

•! kaz@ebi.ac.ukIR:9°S

Acknowledgment / UV

Mark Davies*

Shaun McGlinchey

Yvonne Light* Louisa Bellis*

Ruth Akhtar

Francis Atkinson**

Patricia Bento

George Papadatos**

Jon Chambers** Anna Gaulton**

Anne Hersey**

John Overington* **

(前職が企業研究者* Pharma出身**)