Optimized Virtual Screening Miklós Vargyas Zsuzsanna Szabó György Pirok Ferenc Csizmadia ChemAxon...

Post on 26-Mar-2015

231 views 0 download

Transcript of Optimized Virtual Screening Miklós Vargyas Zsuzsanna Szabó György Pirok Ferenc Csizmadia ChemAxon...

Optimized Virtual Screening

Miklós VargyasZsuzsanna SzabóGyörgy PirokFerenc Csizmadia

ChemAxon Ltd.

Matthias StegerModest von Korff

AXOVAN AGAllschwil, Switzerland

(Axovan is now Actelion.)

Slide 1

Drug research

structures foundcorporate database

Is it searching for a needle in a haystack?

Slide 2

structures found (virtual hits)

query structures (known actives)

corporate database (targets)

Find something similar to a fistful of needles

Drug research

Slide 3

Molecular similarity

Chemical, pharmacological or biological properties of two compounds match.

The more the common features, the higher the similarity between two molecules.

Chemical

Pharmacophore

What is it?

Slide 4

Molecular similarity

How to calculate it?

)&()()(

)&(),(

yxByBxB

yxByxT

n

iii yxyxE

1

2),(

Sequences/vectors of bits, or numeric values that can be compared by distance functions, similarity metrics.

Quantitative assessment of similarity/dissimilarity of structuresneed a numerically tractable formmolecular descriptors, fingerprints, structural keys

Slide 5

hashed binary fingerprintencodes topological properties of the chemical graph: connectivity,

edge label (bond type), node label (atom type)allows the comparison of two molecules with respect to their

chemical structure

Molecular descriptors

Example 1: chemical fingerprint

Construction

1. find all 0, 1, …, n step walks in the chemical graph2. generate a bit array for each walks with given number of bits set3. merge the bit arrays with logical OR operation

Slide 6

Molecular descriptors

Example 1: chemical fingerprint

ExampleCH3 – CH2 – OH

walks from the first carbon atom

length walk bit array

0 C 1010000000

1 C – H 0001010000

1 C – C 0001000100

2 C – C – H 0001000010

2 C – C – O 0100010000

3 C – C – O – H 0000011000

merge bit arrays for the first carbon atom: 1111011110

Slide 7

Molecular descriptors

Example 1: chemical fingerprint

0100010100010100010000000001101010011010100000010100000000100000

0100010100010100010000000001101010011010100000000100000000100000

Slide 8

Molecular descriptors

Example 2: pharmacophore fingerprint

encodes pharmacophore properties of molecules as frequency counts of pharmacophore point pairs at given topological distance

allows the comparison of two molecules with respect to their pharmacophore

Construction

1. map pharmacophore point type to atoms2. calculate length of shortest path between each pair of atoms3. assign a histogram to every pharmacophore point pairs and count

the frequency of the pair with respect to its distance

Slide 9

Molecular descriptors

Example 2: pharmacophore fingerprint

Pharmacophore point type based coloring of atoms: acceptor, donor, hydrophobic, none.

AA1

AA2

AA3

AA4

AA5

AA6

DA1

DA2

DA3

DA4

DA5

DA6

DD1

DD2

DD3

DD4

DD5

DD6

HA1

HA2

HA3

HA4

HA5

HA6

HD1

HD2

HD3

HD4

HD5

HD6

HH1

HH2

HH3

HH4

HH5

HH6

0

1

2

3

4

5

6

7

8

9

10

11

12

AA1

AA2

AA3

AA4

AA5

AA6

DA1

DA2

DA3

DA4

DA5

DA6

DD1

DD2

DD3

DD4

DD5

DD6

HA1

HA2

HA3

HA4

HA5

HA6

HD1

HD2

HD3

HD4

HD5

HD6

HH1

HH2

HH3

HH4

HH5

HH6

0

1

2

3

4

5

6

7

8

9

10

11

12

Slide 10

000000010000110100000010101000000000011000001000010000100000100001000101100100100101100110100111001111010000001100000001100010000100010100011101010000110000101000010011000010100000000100100000000110111001110111111010000010001000011011011000000010011010000001000101001101000100000000100000000100100000001001000010001010000100011100011101000100001011101100110110010010001101001100001000010111010011010101011111100001000001111110001000010000100010100001000101001111010100001000100000000100100000101001000010001010000001000100010100010100100000000000001010000010000100000100000000010001010001001100000000000000000001010000001000000000000000000001000101000101000000000000001010000100100000000001000000000000000101010101111100111110100000000000011010100011100100001100101000010001010001100001000001100000000001000100000011000000000110000000000001000000000100001000000000000010101000000001000001001000000100010100010100000000100000000000010000000000000100001000011000000100010000110001001010000001010010101110001000010000100010100001000111000101000100001000010011100100100000100011000000001010000101010100010100010100100000000000010010000010010100100100010000

query

targets

query fingerprint

proximity

target fingerprints

hits

0101010100010100010100100000000000010010000010010100100100010000

Virtual screening using fingerprints

Individual query structure

Slide 11

000000010000110100000010101000000000011000001000010000100000100001000101100100100101100110100111001111010000001100000001100010000100010100011101010000110000101000010011000010100000000100100000000110111001110111111010000010001000011011011000000010011010000001000101001101000100000000100000000100100000001001000010001010000100011100011101000100001011101100110110010010001101001100001000010111010011010101011111100001000001111110001000010000100010100001000101001111010100001000100000000100100000101001000010001010000001000100010100010100100000000000001010000010000100000100000000010001010001001100000000000000000001010000001000000000000000000001000101000101000000000000001010000100100000000001000000000000000101010101111100111110100000000000011010100011100100001100101000010001010001100001000001100000000001000100000011000000000110000000000001000000000100001000000000000010101000000001000001001000000100010100010100000000100000000000010000000000000100001000011000000100010000110001001010000001010010101110001000010000100010100001000111000101000100001000010011100100100000100011000000001010000101010100010100010100100000000000010010000010010100100100010000

queries

targets

hypothesis fingerprint

proximity

target fingerprints

hits

Virtual screening using fingerprints

Multiple query structures010001010001110101000011000010100001001100001010000000010010000000011011100111011111101000001000100001101101100000001001101000000100010100110100010000000010000000010010000000100100001000101000010111010011010101011111100001000001111110001000010000100010100000010001000101000101001000000000000010100000100001000001000000000100010100010100000000000000101000010010000000000100000000000000010101010111110011111010000000000001101010001110010000110010100001000101000110000100000110000000000100010000001100000000011000000000000100000000010000100000000000001010100000000100000100100000

0101110100110101010111111000010000011111100010000100001000101000

Slide 12

Hypothesis fingerprints

allows faster operation compiles features common to each individual actives

Active 1 0 2 7 1 0 1 6 4 0 0 9 0

Active 2 1 6 0 4 3 3 1 2 2 0 5 1

Active 3 2 4 4 1 0 2 5 3 4 3 4 5

Minimum 0 2 0 1 0 1 1 2 0 0 4 0

Average 1 4 3.67 2 1 2 4 3 2 1.33 6 2

Median 1.5 4 5.5 1 0 2 5 3 3 0 5 3

Hypothesis types

Advantages

Slide 13

Hypothesis fingerprints

Advantages Disadvantages

Minimum •strict conditions for hits if actives are fairly similar

• false results with asymmetric metrics

•misses common features of highly diverse sets

•very sensitive to one missing feature

Average •captures common features of more diverse active sets

• less selective if actives are very similar

Median •captures common features of more diverse active sets

•specific treatment of the absence of a feature

• less sensitive to outliers

• less selective if actives are very similar

Slide 14

Does this work?Slide 15

Active set Pharmacophore fingerprint

Chemical fingerprint

name size Tanimoto Euclidean Tanimoto Euclidean

5-HT3 12 20.14 12.55 776.19 461.44

ACE 89 1.99 1.42 3.71 1.74

Angiotensin2 10 22.80 27.81 183.45 173.91

Beta2 50 3.59 1.52 7.52 2.65

D2 13 61.25 27.64 302.52 155.61

delta 20 109.53 11.66 114.48 56.22

Ftp 35 50.92 46.88 571.50 575.16

mGluR1 18 70.47 5.59 347.72 130.14

NPY-5 139 1.09 1.00 1.46 1.44

Thrombin 8 2.46 2.56 3.71 1.67

Then why do we need optimization?Too many hits

Slide 16

Then why do we need optimization?

0.47 0.55

0.57

Inconsistent dissimilarity values

Slide 17

What can be optimized?

22, 1),(

iiii yx

iiiyx

iiiasymmetricweighted

Euclidean yxwyxwyxD

i iiii iiii ii iiii i

i iiiasymmetricscaledTanimoto

yxsyxsyyxsx

yxsyxD

),min(),min(1),min(

),min(1),(,

1,0 asymmetry factor

Nis scaling factor

1,0 asymmetry factor

1,0iw weights

Parameterized metrics

Slide 18

Optimization of metrics

selected targets

training set

test set

known actives

query set

training set

testset

Step 1 optimize parameters for maximum enrichmentStep 2 validate metrics over an independent test set

Slide 19

Optimization of metrics

query set

training set

Step 1 optimize parameters for maximum enrichment

Target hits

Active hits

1111100010000100001000101000

query fingerprint

Slide 20

Optimization of metrics

v1

v2

v3

vi

vn

One step of the algorithm

potential variable value

temporarily fixed value

running variable value

final value

Slide 21

Optimization of metrics

test set

Step 2 validate metrics over an independent test set

Target hits

Active hits

query set

1111100010000100001000101000

query fingerprint

Slide 22

Results

0.47 0.55

0.57

0.28

0.20

0.06

Similar structures get closer

Slide 23

Results

Hit set size reduction Active set: 18 mGlu-R1 antagonistsTarget set: 10000 randomly selected drug-like structures + 7 spikes

Slide 24

Metric Enrichment Test hits

Random hitsT

anim

oto

Basic 70.47 5.43 172.00Scaled 7.63 6.00 1101.71Asymmetric 99.36 5.29 106.00Scaled Asymmetric 11.94 5.86 731.14E

uclid

ea

n

Basic 5.59 5.43 1456.57Normalized 11.33 5.14 791.29Asymmetric Normalized 18.58 4.71 368.71Weighted Normalized 296.30 4.14 27.57Weighted Asymmetric Normalized 281.30 3.43 17.00

Results

Improvement by optimization

Slide 25

Active set size Euclidean Optimized Improvement ratio

5-HT3 12 12.55 239.24 49.26

ACE 89 1.42 6.50 4.64

Angiotensin2 10 27.81 85.45 11.15

Beta2 50 1.52 24.70 17.42

D2 13 27.64 123.25 11.19

delta 20 11.66 243.57 69.11

Ftp 35 46.88 71.54 5.35

mGluR1 18 5.59 296.30 70.93

NPY-5 139 1.00 3.22 3.25

Thrombin 8 2.56 4.57 2.62

Results

Active Hit Distribution

offers a more intuitive way to evaluate the efficiency of screeningbased on sorting random set hits and known actives on

dissimilarity values and counting the number of random set hits preceding each active in the sorted list

0.0140.0150.0170.0200.0220.0230.0270.0410.043

number of actives

number of virtual

hits

Slide 26

Results

ACE (pharmacophore similarity)

1

10

100

1000

10000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Number of actives among the hits

Nu

mb

er o

f h

its

Euclidean

OptimizedEuclidean

Slide 27

Results

NPY-5 (pharmacophore similarity)

1

10

100

1000

10000

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Number of Active Hits

Num

ber o

f Hits

Tanimoto Euclidean Optimized Ideal

Slide 28

Results

β2-adrenoceptor (pharmacophore similarity)

1

10

100

1000

10000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Number of Active Hits

Num

ber

of H

its

Tanimto Euclidean Optimized Ideal

Slide 29

Results

Structural or pharmacophore fingerprint?

Slide 30

* Average 1-Tanimoto coefficient between each pair of compounds in the active set, based on chemical fingerprint.

Active set size chemical pharmacophore diversity*

5-HT3 12 692.21 239.24 0.30

ACE 89 4.29 6.50 0.56

Angiotensin2 10 190.76 85.45 0.40

Beta2 50 10.98 24.70 0.50

D2 13 358.10 123.25 0.30

delta 20 249.40 243.57 0.32

Ftp 35 575.16 71.54 0.30

mGluR1 18 350.86 296.30 0.37

NPY-5 139 1.52 3.22 0.47

Thrombin 8 3.59 4.57 0.46

Results

Scaffold hopping

Slide 31

Acknowledgements

Nóra MátéSzilárd Dóránt

Bernard Przybylski (Axovan)

Contributors:

The research was supported by

Slide 32

(Axovan is now part of Actelion.)

BibliographyJ. Xu: GMA: A Generic Match Algorithm for Structural Homomorphism,

Isomorphism, and Maximal Common Substructure Match and its Applications, J. Chem. Inf. Comput. Sci., 1996, 36, 1, 25-34.

L. Xue, F. L. Stahura, J. W. Godden, J. Bajorath: Fingerprint Scaling Increases the Probability of Identifying Molecules with Similar Activity in Virtual Screening Calculations, J. Chem. Inf. Comput. Sci., 2001, 41, 3, 746-753.

G. Schneider, W. Neidhart, T. Giller, and G. Schmid: 'Scaffold-Hopping' by Topological Pharmacophore Search: A Contribution to Virtual Screening, Angew. Chem. Int. Ed., 1999, 38, 19, 2894-2896

D. Horvath: High Throughput Conformational Sampling and Fuzzy Similarity Metrics: A Novel Approach to Similarity Searching and Focused Combinatorial Library Design and its Role in the Drug Discovery Laboratory; manuscript

J. Bajorath: Virtual screening in drug discovery: Methods, expectations and reality http://www.currentdrugdiscovery.com/pdf/2002/3/BAJORATH.pdf

Slide 33