Optimized Virtual Screening Miklós Vargyas Zsuzsanna Szabó György Pirok Ferenc Csizmadia ChemAxon...
-
Upload
abigail-henderson -
Category
Documents
-
view
231 -
download
0
Transcript of Optimized Virtual Screening Miklós Vargyas Zsuzsanna Szabó György Pirok Ferenc Csizmadia ChemAxon...
Optimized Virtual Screening
Miklós VargyasZsuzsanna SzabóGyörgy PirokFerenc Csizmadia
ChemAxon Ltd.
Matthias StegerModest von Korff
AXOVAN AGAllschwil, Switzerland
(Axovan is now Actelion.)
Slide 1
Drug research
structures foundcorporate database
Is it searching for a needle in a haystack?
Slide 2
structures found (virtual hits)
query structures (known actives)
corporate database (targets)
Find something similar to a fistful of needles
Drug research
Slide 3
Molecular similarity
Chemical, pharmacological or biological properties of two compounds match.
The more the common features, the higher the similarity between two molecules.
Chemical
Pharmacophore
What is it?
Slide 4
Molecular similarity
How to calculate it?
)&()()(
)&(),(
yxByBxB
yxByxT
n
iii yxyxE
1
2),(
Sequences/vectors of bits, or numeric values that can be compared by distance functions, similarity metrics.
Quantitative assessment of similarity/dissimilarity of structuresneed a numerically tractable formmolecular descriptors, fingerprints, structural keys
Slide 5
hashed binary fingerprintencodes topological properties of the chemical graph: connectivity,
edge label (bond type), node label (atom type)allows the comparison of two molecules with respect to their
chemical structure
Molecular descriptors
Example 1: chemical fingerprint
Construction
1. find all 0, 1, …, n step walks in the chemical graph2. generate a bit array for each walks with given number of bits set3. merge the bit arrays with logical OR operation
Slide 6
Molecular descriptors
Example 1: chemical fingerprint
ExampleCH3 – CH2 – OH
walks from the first carbon atom
length walk bit array
0 C 1010000000
1 C – H 0001010000
1 C – C 0001000100
2 C – C – H 0001000010
2 C – C – O 0100010000
3 C – C – O – H 0000011000
merge bit arrays for the first carbon atom: 1111011110
Slide 7
Molecular descriptors
Example 1: chemical fingerprint
0100010100010100010000000001101010011010100000010100000000100000
0100010100010100010000000001101010011010100000000100000000100000
Slide 8
Molecular descriptors
Example 2: pharmacophore fingerprint
encodes pharmacophore properties of molecules as frequency counts of pharmacophore point pairs at given topological distance
allows the comparison of two molecules with respect to their pharmacophore
Construction
1. map pharmacophore point type to atoms2. calculate length of shortest path between each pair of atoms3. assign a histogram to every pharmacophore point pairs and count
the frequency of the pair with respect to its distance
Slide 9
Molecular descriptors
Example 2: pharmacophore fingerprint
Pharmacophore point type based coloring of atoms: acceptor, donor, hydrophobic, none.
AA1
AA2
AA3
AA4
AA5
AA6
DA1
DA2
DA3
DA4
DA5
DA6
DD1
DD2
DD3
DD4
DD5
DD6
HA1
HA2
HA3
HA4
HA5
HA6
HD1
HD2
HD3
HD4
HD5
HD6
HH1
HH2
HH3
HH4
HH5
HH6
0
1
2
3
4
5
6
7
8
9
10
11
12
AA1
AA2
AA3
AA4
AA5
AA6
DA1
DA2
DA3
DA4
DA5
DA6
DD1
DD2
DD3
DD4
DD5
DD6
HA1
HA2
HA3
HA4
HA5
HA6
HD1
HD2
HD3
HD4
HD5
HD6
HH1
HH2
HH3
HH4
HH5
HH6
0
1
2
3
4
5
6
7
8
9
10
11
12
Slide 10
000000010000110100000010101000000000011000001000010000100000100001000101100100100101100110100111001111010000001100000001100010000100010100011101010000110000101000010011000010100000000100100000000110111001110111111010000010001000011011011000000010011010000001000101001101000100000000100000000100100000001001000010001010000100011100011101000100001011101100110110010010001101001100001000010111010011010101011111100001000001111110001000010000100010100001000101001111010100001000100000000100100000101001000010001010000001000100010100010100100000000000001010000010000100000100000000010001010001001100000000000000000001010000001000000000000000000001000101000101000000000000001010000100100000000001000000000000000101010101111100111110100000000000011010100011100100001100101000010001010001100001000001100000000001000100000011000000000110000000000001000000000100001000000000000010101000000001000001001000000100010100010100000000100000000000010000000000000100001000011000000100010000110001001010000001010010101110001000010000100010100001000111000101000100001000010011100100100000100011000000001010000101010100010100010100100000000000010010000010010100100100010000
query
targets
query fingerprint
proximity
target fingerprints
hits
0101010100010100010100100000000000010010000010010100100100010000
Virtual screening using fingerprints
Individual query structure
Slide 11
000000010000110100000010101000000000011000001000010000100000100001000101100100100101100110100111001111010000001100000001100010000100010100011101010000110000101000010011000010100000000100100000000110111001110111111010000010001000011011011000000010011010000001000101001101000100000000100000000100100000001001000010001010000100011100011101000100001011101100110110010010001101001100001000010111010011010101011111100001000001111110001000010000100010100001000101001111010100001000100000000100100000101001000010001010000001000100010100010100100000000000001010000010000100000100000000010001010001001100000000000000000001010000001000000000000000000001000101000101000000000000001010000100100000000001000000000000000101010101111100111110100000000000011010100011100100001100101000010001010001100001000001100000000001000100000011000000000110000000000001000000000100001000000000000010101000000001000001001000000100010100010100000000100000000000010000000000000100001000011000000100010000110001001010000001010010101110001000010000100010100001000111000101000100001000010011100100100000100011000000001010000101010100010100010100100000000000010010000010010100100100010000
queries
targets
hypothesis fingerprint
proximity
target fingerprints
hits
Virtual screening using fingerprints
Multiple query structures010001010001110101000011000010100001001100001010000000010010000000011011100111011111101000001000100001101101100000001001101000000100010100110100010000000010000000010010000000100100001000101000010111010011010101011111100001000001111110001000010000100010100000010001000101000101001000000000000010100000100001000001000000000100010100010100000000000000101000010010000000000100000000000000010101010111110011111010000000000001101010001110010000110010100001000101000110000100000110000000000100010000001100000000011000000000000100000000010000100000000000001010100000000100000100100000
0101110100110101010111111000010000011111100010000100001000101000
Slide 12
Hypothesis fingerprints
allows faster operation compiles features common to each individual actives
Active 1 0 2 7 1 0 1 6 4 0 0 9 0
Active 2 1 6 0 4 3 3 1 2 2 0 5 1
Active 3 2 4 4 1 0 2 5 3 4 3 4 5
Minimum 0 2 0 1 0 1 1 2 0 0 4 0
Average 1 4 3.67 2 1 2 4 3 2 1.33 6 2
Median 1.5 4 5.5 1 0 2 5 3 3 0 5 3
Hypothesis types
Advantages
Slide 13
Hypothesis fingerprints
Advantages Disadvantages
Minimum •strict conditions for hits if actives are fairly similar
• false results with asymmetric metrics
•misses common features of highly diverse sets
•very sensitive to one missing feature
Average •captures common features of more diverse active sets
• less selective if actives are very similar
Median •captures common features of more diverse active sets
•specific treatment of the absence of a feature
• less sensitive to outliers
• less selective if actives are very similar
Slide 14
Does this work?Slide 15
Active set Pharmacophore fingerprint
Chemical fingerprint
name size Tanimoto Euclidean Tanimoto Euclidean
5-HT3 12 20.14 12.55 776.19 461.44
ACE 89 1.99 1.42 3.71 1.74
Angiotensin2 10 22.80 27.81 183.45 173.91
Beta2 50 3.59 1.52 7.52 2.65
D2 13 61.25 27.64 302.52 155.61
delta 20 109.53 11.66 114.48 56.22
Ftp 35 50.92 46.88 571.50 575.16
mGluR1 18 70.47 5.59 347.72 130.14
NPY-5 139 1.09 1.00 1.46 1.44
Thrombin 8 2.46 2.56 3.71 1.67
Then why do we need optimization?Too many hits
Slide 16
Then why do we need optimization?
0.47 0.55
0.57
Inconsistent dissimilarity values
Slide 17
What can be optimized?
22, 1),(
iiii yx
iiiyx
iiiasymmetricweighted
Euclidean yxwyxwyxD
i iiii iiii ii iiii i
i iiiasymmetricscaledTanimoto
yxsyxsyyxsx
yxsyxD
),min(),min(1),min(
),min(1),(,
1,0 asymmetry factor
Nis scaling factor
1,0 asymmetry factor
1,0iw weights
Parameterized metrics
Slide 18
Optimization of metrics
selected targets
training set
test set
known actives
query set
training set
testset
Step 1 optimize parameters for maximum enrichmentStep 2 validate metrics over an independent test set
Slide 19
Optimization of metrics
query set
training set
Step 1 optimize parameters for maximum enrichment
Target hits
Active hits
1111100010000100001000101000
query fingerprint
Slide 20
Optimization of metrics
v1
v2
v3
vi
vn
One step of the algorithm
potential variable value
temporarily fixed value
running variable value
final value
Slide 21
Optimization of metrics
test set
Step 2 validate metrics over an independent test set
Target hits
Active hits
query set
1111100010000100001000101000
query fingerprint
Slide 22
Results
0.47 0.55
0.57
0.28
0.20
0.06
Similar structures get closer
Slide 23
Results
Hit set size reduction Active set: 18 mGlu-R1 antagonistsTarget set: 10000 randomly selected drug-like structures + 7 spikes
Slide 24
Metric Enrichment Test hits
Random hitsT
anim
oto
Basic 70.47 5.43 172.00Scaled 7.63 6.00 1101.71Asymmetric 99.36 5.29 106.00Scaled Asymmetric 11.94 5.86 731.14E
uclid
ea
n
Basic 5.59 5.43 1456.57Normalized 11.33 5.14 791.29Asymmetric Normalized 18.58 4.71 368.71Weighted Normalized 296.30 4.14 27.57Weighted Asymmetric Normalized 281.30 3.43 17.00
Results
Improvement by optimization
Slide 25
Active set size Euclidean Optimized Improvement ratio
5-HT3 12 12.55 239.24 49.26
ACE 89 1.42 6.50 4.64
Angiotensin2 10 27.81 85.45 11.15
Beta2 50 1.52 24.70 17.42
D2 13 27.64 123.25 11.19
delta 20 11.66 243.57 69.11
Ftp 35 46.88 71.54 5.35
mGluR1 18 5.59 296.30 70.93
NPY-5 139 1.00 3.22 3.25
Thrombin 8 2.56 4.57 2.62
Results
Active Hit Distribution
offers a more intuitive way to evaluate the efficiency of screeningbased on sorting random set hits and known actives on
dissimilarity values and counting the number of random set hits preceding each active in the sorted list
0.0140.0150.0170.0200.0220.0230.0270.0410.043
number of actives
number of virtual
hits
Slide 26
Results
ACE (pharmacophore similarity)
1
10
100
1000
10000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Number of actives among the hits
Nu
mb
er o
f h
its
Euclidean
OptimizedEuclidean
Slide 27
Results
NPY-5 (pharmacophore similarity)
1
10
100
1000
10000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Number of Active Hits
Num
ber o
f Hits
Tanimoto Euclidean Optimized Ideal
Slide 28
Results
β2-adrenoceptor (pharmacophore similarity)
1
10
100
1000
10000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Number of Active Hits
Num
ber
of H
its
Tanimto Euclidean Optimized Ideal
Slide 29
Results
Structural or pharmacophore fingerprint?
Slide 30
* Average 1-Tanimoto coefficient between each pair of compounds in the active set, based on chemical fingerprint.
Active set size chemical pharmacophore diversity*
5-HT3 12 692.21 239.24 0.30
ACE 89 4.29 6.50 0.56
Angiotensin2 10 190.76 85.45 0.40
Beta2 50 10.98 24.70 0.50
D2 13 358.10 123.25 0.30
delta 20 249.40 243.57 0.32
Ftp 35 575.16 71.54 0.30
mGluR1 18 350.86 296.30 0.37
NPY-5 139 1.52 3.22 0.47
Thrombin 8 3.59 4.57 0.46
Results
Scaffold hopping
Slide 31
Acknowledgements
Nóra MátéSzilárd Dóránt
Bernard Przybylski (Axovan)
Contributors:
The research was supported by
Slide 32
(Axovan is now part of Actelion.)
BibliographyJ. Xu: GMA: A Generic Match Algorithm for Structural Homomorphism,
Isomorphism, and Maximal Common Substructure Match and its Applications, J. Chem. Inf. Comput. Sci., 1996, 36, 1, 25-34.
L. Xue, F. L. Stahura, J. W. Godden, J. Bajorath: Fingerprint Scaling Increases the Probability of Identifying Molecules with Similar Activity in Virtual Screening Calculations, J. Chem. Inf. Comput. Sci., 2001, 41, 3, 746-753.
G. Schneider, W. Neidhart, T. Giller, and G. Schmid: 'Scaffold-Hopping' by Topological Pharmacophore Search: A Contribution to Virtual Screening, Angew. Chem. Int. Ed., 1999, 38, 19, 2894-2896
D. Horvath: High Throughput Conformational Sampling and Fuzzy Similarity Metrics: A Novel Approach to Similarity Searching and Focused Combinatorial Library Design and its Role in the Drug Discovery Laboratory; manuscript
J. Bajorath: Virtual screening in drug discovery: Methods, expectations and reality http://www.currentdrugdiscovery.com/pdf/2002/3/BAJORATH.pdf
Slide 33