ChEBI Kirill Degtyarenko, EMBL-EBI / EPO. Rafael Alcántara Michael Ashburner * Volker Ast * Michael...

Post on 15-Jan-2016

214 views 0 download

Transcript of ChEBI Kirill Degtyarenko, EMBL-EBI / EPO. Rafael Alcántara Michael Ashburner * Volker Ast * Michael...

ChEBIChEBI

Kirill Degtyarenko, EMBL-EBI / EPOKirill Degtyarenko, EMBL-EBI / EPO

• Rafael Alcántara• Michael Ashburner *• Volker Ast *• Michael Darsow *• Paula de Matos• Marcus Ennis• Janna Hastings• Alan McNaught *• Inma Spiteri• Christoph Steinbeck• Martin Zbinden *

The team

Chemical Entities of Biological Interest – an EBI database/dictionary of ‘biochemical compounds’

ChEBI: What is it?

Can be defined as consisting of

“molecules not directly encoded by the genome ... that are either the products of nature or are synthetic products used ... to intervene in the processes of living organisms”

[Michael Ashburner]

What are the ‘biochemical compounds’?

“Any constitutionally or isotopically distinct atom, molecule, ion, ion pair, radical, radical ion, complex, conformer etc., identifiable as a separately distinguishable entity”

[IUPAC “Gold Book”]

Molecular entity

• Molecular entities trans-vaccenic acid

• Groups trans-vaccenoyl group

• Classes fatty acids

In fact, ChEBI contains

‘Small molecules’?

Yes, but big molecules as well!

• alumina

• amylose

• metaborate

• poly(vinyl alcohol)

Current status (17.12.08)

14,274

9,196

13,163

15,773

14,847

43,880

16,618

0 5,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 45,000 50,000

Structures

Database Links

Formulae

Registry Numbers

IUPAC names

Synonyms

ChEBI entries

1-D ChEBI

• Numeric ID

• Carefully checked terminology

• Unambiguous ChEBI name

• IUPAC names

• Cross-references to free resources

Unambiguous ChEBI name

CHEBI:28918

L-adrenaline

not just ‘adrenaline’

2-{[3-(trifluoromethyl)phenyl]amino}benzoic acid

NH

O

OH

F

F

F

Systematic Name (IUPAC)

1

23

4

5

6

1

2

34

5

6

• flufenamic acid (INN English)• acide flufénamique (INN French)• ácido flufenámico (INN Spanish)• acidum flufenamicum (INN Latin)• Flufenaminsäure (German)

NH

O

OH

F

F

F

Common Name

The Unpronounceables

CHEBI:48935

(E)-roxithromycin

IUPAC name:

(3R,4S,5S,6R,7R,9R,10E,11S,12R,13S,14R)-4-(2,6-dideoxy-3-C-methyl-3-O-methyl-α-L-ribo-hexopyranosyloxy)-14-ethyl-7,12,13-trihydroxy-10-{[(2-methoxyethoxy)methoxy]imino}-6-[3,4,6-trideoxy-3-(dimethylamino)-β-D-xylo-hexopyranosyloxy]-3,5,7,9,11,13-hexamethyloxacyclotetradecan-2-one

O O

O

O

OH

N

O

O

N

OH

OH

O OO

O

OH OH

CH3

CH3

CH3

CH3

CH3CH3

CH3 CH3

CH3

CH3

CH3

CH3

CH3CH3

O O

O

O

OH

N

O

O

N

OH

OHO

OH OH

CH3

CH3

CH3CH3CH3

CH3 CH3

CH3

CH3

CH3

CH3

CH3CH3

OOO

CH3

CHEBI:32109(Z)-roxithromycin

What is the common name of roxithromycin?

CHEBI:48935(E)-roxithromycinINN: roxithromycin

O O

O

O

OH

N

O

O

N

OH

OH

O OO

O

OH OH

CH3

CH3

CH3

CH3

CH3CH3

CH3 CH3

CH3

CH3

CH3

CH3

CH3CH3

O O

O

O

OH

N

O

O

N

OH

OH

O OO

O

OH OH

CH3

CH3

CH3

CH3

CH3CH3

CH3 CH3

CH3

CH3

CH3

CH3

CH3CH3

O O

O

O

OH

N

O

O

N

OH

OHO

OH OH

CH3

CH3

CH3CH3CH3

CH3 CH3

CH3

CH3

CH3

CH3

CH3CH3

OOO

CH3

CHEBI:48844 roxithromycin

(E)-roxithromycin

O O

O

O

OH

N

O

O

N

OH

OH

O OO

O

OH OH

CH3

CH3

CH3

CH3

CH3CH3

CH3 CH3

CH3

CH3

CH3

CH3

CH3CH3

(Z)-roxithromycin

What is thiamine?CHEBI:18385thiamine(1+)aka thiamine

N

N

NH2

CH3 S

CH3

OH

N+

CHEBI:33283thiamine(1+) chlorideINN: thiamine

N

N

NH2

CH3 S

CH3

OH

N+

Cl-

CHEBI:49105 thiamine(2+) dichlorideaka thiamine chloride hydrochloride aka thiamine hydrochloride

N

NH3+

NCH3 S

CH3

OH

N+

Cl-

Cl-

• “Better to see the face than to hear the name” (Zen proverb)

• Structures and identifiers based on structures offer new ways of crosslinking to other databases

• Structure search

Need for 2-D

ChEBI

9 10 0 0 0 0 999 V2000 11.8219 -7.2713 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 11.8219 -8.0922 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 12.6074 -7.0165 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 11.1072 -6.8574 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 12.6039 -8.3505 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 11.1072 -8.5027 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 13.0886 -7.6818 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 10.3923 -7.2713 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 10.3888 -8.0922 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 0 0 0 0 1 3 1 0 0 0 0 1 4 1 0 0 0 0 2 5 1 0 0 0 0 2 6 1 0 0 0 0 3 7 1 0 0 0 0 4 8 2 0 0 0 0 6 9 2 0 0 0 0 5 7 2 0 0 0 0 8 9 1 0 0 0 0M END

Connection table

NH

N

N

N

2-D ChEBI

• One or more 2-D (or 3-D) connection tables

• One is default

• Autogenerated images (PNG)

• Default diagrams should be unambiguous

The Fine Art of chemical drawing

Linear forms of monosaccharides

CHO

CH2OH

H OH

OH H

OH H

H OH

OH

O

H OH

OH H

OH H

H OH

H H

OH

OH

OH

OH

OH

O

Pyranose forms of monosaccharides

O

OHH

HOH

HOH

H OH

H

CH2OH

O

CH2OH

OH

OH

OH

OH

OH

OH

OH

OH

OOH

Fused systems

(R)-camphor

ambiguous unambiguous

CH3

OCH3

CH3

O

CH3CH3

CH3

Square planar geometry

Pt

N Cl

ClN

HH

H

H

HH

Pt

NCl

N Cl

H H

H

H

HH

cisplatin transplatin

SMILESInChI

From 2-D back to 1-D

• Simplified Molecular Input Line Entry Specification

• Developed by David Weininger in 1988

• Extended by others (e.g. Daylight)• String of standard ASCII characters• A number of valid SMILES can be

produced for the same molecule

SMILES (1)

SMILES (2)

NH

N

N

N

N1C=NC2=C1C=NC=N2c1ncc2ncnc2n1C=1N\C=N/C\2=N/C=N\C=1/2c1ncnc2/N=C\Nc12n1cc2c(nc1)ncn2 [H]c1nc([H])c2n([H])c([H])nc2n

1

InChI (1)

• IUPAC International Chemical Identifier or InChI

• Open source• Developed by Stein, Heller,

Tchekhovskoi and McNaught• Used by NIST, PubChem, CML…

and ChEBI

InChI (2)

NH

N

N

N

InChI=1/C5H4N4/c1-4-5(8-2-6-1)9-3-7-4/h1-3H,(H,6,7,8,9)/f/h7H

InChIKey=KDCGOANMDULRCW-QDQILVOLCG

Limitations (1)

• Stereochemistry other than sp3 tetrahedral and sp2 trigonal planar

• Polymers• Conformers• Radicals/different spin state• Topological isomers• Mixtures• Markush structures

Limitations (2)

InChI=1/2ClH.2H3N.Pt/h2*1H;2*1H3;/q;;;;+2/p-2

Pt

N Cl

ClN

HH

H

H

HH

Pt

NCl

N Cl

H H

H

H

HH

cisplatin transplatin

3-D ChEBI

cisplatin

Compositional uncertainty

Positional uncertainty

Configurational uncertainty

Conformational uncertainty

Uncertainty and ambiguity in chemistry

Examples

an alkali metal cation

vanadate(V) anion

[2H]ethanol

Compositional uncertainty

Examples

L-bromohistidine residue

pteroic acid (several tautomers)

Positional uncertainty

Examples

androstane

rel-(2R,3R)-2-amino-3-methylpentanoic acid

tetradec-11-enoic acid

Configurational uncertainty

Examples

cyclohexane: chair, boat, twist

protein secondary structure: , , …

Conformational uncertainty

• Molecular structure ontology• Subatomic particle ontology• Role ontology

Biological role Application

ChEBI ontology

Molecular structure ontology catecholamines

Biological role hormone

Application antiglaucoma bronchodilator cardiostimulant

L-adrenaline

The family relations

L-cysteine

L-cysteine(•)

L-cysteinate(2–)

L-cysteinate(1–)

L-cysteinyl

L-cysteinium

L-cysteino

L-cystein-S-yl

L-cysteine residue

L-cysteinate residue

D-cysteine

cysteine

L-cysteine zwitterion

Relationships in ChEBI ∆ Is A generic

⋄ Has Part generic

♯ Is Conjugate Acid Of specific

♭ Is Conjugate Base Of specific

Is Enantiomer Of specific

Is Tautomer Of specific

ℛ Is Substituent Group From specific

ℋ Has Parent Hydride specific

ℱ Has Functional Parent specific

Has Role generic?

Is A relationship

NH2

O

OHSH

NH2

O

OHSH∆

L-cysteine

cysteineis a

NH2

O

OHSH

Is Enantiomer Of

NH2

O

OHSH

L-cysteine

NH2

O

OHSH

∆ ∆

D-cysteine

is enantiomer of

NH3+

O

OHSH

NH3+

O

OHSH

L-cysteinium

Has Part

L-cysteine hydrochloride

is part of

Cl-

has part

NH2

O

O-

S-

NH3+

O

OHSH

NH2

O

O-

SH

Is Conjugate Acid Of

NH2

O

OHSH♯

L-cysteine

L-cysteinate(1–)is conjugate acid of

L-cysteinium

L-cysteinate(2–)

♯♯

NH2

O

O-

SH

Is Conjugate Base Of

NH2

O

OHSH

L-cysteine

L-cysteinate(1–)

NH2

O

O-

S-

NH3+

O

OHSH

L-cysteinium

L-cysteinate(2–)

♭ ♭

NH2

O

O-

SH

Acid/base relationships

NH2

O

OHSH

♭L-

cysteineL-cysteinate(1–)

NH2

O

O-

S-

NH3+

O

OHSH

L-cysteinium

L-cysteinate(2–)

♭♯♯

NH3+

O

O-

SH

Is Tautomer Of

NH2

O

OHSH

L-cysteine

L-cysteine zwitterion

is tautomer of

Is Tautomer Of

3H-pyrrole

NH

N N

2H-pyrrole

1H-pyrrole

salutaridinol

Has Parent Hydride

has parent hydride

is parent hydride of

ℋ NHH

morphinan

OH

N

O

O

CH3

OH

CH3

CH3

7-O-acetylsalutaridinol

Has Functional Parent

has functional parent

is functional parent of

salutaridinol

OH

N

O

O

CH3

CH3

CH3

OCH3

O

OH

N

O

O

CH3

OH

CH3

CH3

NH2

O

SH

L-cysteinyl

NH

O

SH

NH

O

OHSH

Is Substituent Group From

NH2

O

OHSHL-cysteine

L-cysteine residue

L-cysteino

*

*

*

*

The family relations

L-cysteine

L-cysteine(•)

L-cysteinate(2–)

L-cysteinate(1–)

L-cysteinyl

L-cysteinium

L-cysteino

L-cystein-S-yl

L-cysteine residue

L-cysteinate residue

D-cysteine

cysteine

L-cysteine zwitterion

♭♯♯ ♭

♯ ♭ ♯ ♭

♯♭♯ ♭

Ontology of L-cysteine (1)

Ontology of L-cysteine (2)

Thank youThank you