- Smiles arbitrary target specification
Smiles ARbitrary Target Specification (SMARTS) is a language for specifying substructural patterns in
molecules. The SMARTS line notation is expressive and allows extremely precise and transparent substructural specification and atom typing.
SMARTS is related to the
SMILESline notation [ Simplified Molecular Input Line Entry Specification(SMILES), Wikipedia] that is used to encode molecular structures and like SMILES was originally developed by David Weiningerand colleagues at Daylight Chemical Information Systems. The most comprehensive descriptions of the SMARTS language can be found in Daylight's SMARTS theory manual [ [http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html SMARTS Theory Manual] , Daylight Chemical Information Systems, Santa Fe, New Mexico] , tutorial [ [http://www.daylight.com/dayhtml_tutorials/languages/smarts/index.html#INTRO SMARTS Tutorial] , Daylight Chemical Information Systems, Santa Fe, New Mexico] and examples. [ [http://www.daylight.com/dayhtml_tutorials/languages/smarts/smarts_examples.html SMARTS Examples] , Daylight Chemical Information Systems, Santa Fe, New Mexico.] OpenEye Scientific Softwarehas developed their own version of SMARTS which differs from the original Daylight version in how the R descriptor (see cyclicity below) is defined.
Atoms can be specified by symbol or atomic number.
Aliphaticcarbon is matched by [C] , aromatic carbon by [c] and any carbon by [#6] or [C,c] . The wild card symbol *, A and a match any atom, any aliphaticatom and any aromaticatom respectively. Implicit hydrogens are considered to be a characteristic of atoms and the SMARTS for an amino group can be written as [NH2] . Charge is specified by the descriptors '+' and '-' as exemplified by the SMARTS [nH+] (protonated aromaticnitrogen atom) and [O-] C(=O)c (deprotonated aromatic carboxylic acid).
A number of bond types can be specified: '-' (single), '=' (double), '#' (triple), ':' (aromatic) and '~' (any).
The X and D descriptors are used to specify the total numbers of connections (including implicit hydrogen atoms) and connections to explicit atoms. Thus [CX4] matches carbon atoms with bonds to 4 other atoms while [CD4] matches quaternary carbon.
As originally defined by Daylight, the R descriptor is used to specify ring membership. In the Daylight model for cyclic systems, the smallest set of smallest rings (SSSR) [Downs, G.M.; Gillet, V.J.; Holliday, J.D.; Lynch, M.F.(1989) [http://dx.doi.org/10.1021/ci00063a007 A Review of Ring Perception Algorithms for Chemical Graphs] , "J. Chem. Inf. Comput. Sci." 29, 172-187] is used as a basis for ring membership. For example
indoleis perceived as a 5-membered ring fused with a 6-membered ring rather than a 9-membered ring. The two carbon atoms that make up the ring fusion would match [cR2] and the other carbon atoms would match [cR1] .
The SSSR model has been criticised by OpenEye [ [http://www.eyesopen.com/docs/html/cplusprog/node68.html Smallest Set of Smallest Rings (SSSR) considered Harmful] , OEChem - C++ Manual, Version 1.5.1, OpenEye Scientific Software, Santa Fe, New Mexico] who, in their implementation of SMARTS, use R to denote the number of ring bonds for an atom. The two carbon atoms in the ring fusion match [cR3] and the other carbons match [cR2] in the OpenEye implementation of SMARTS. Used without a number, R specifies an atom in a ring in both implementations, for example [CR] (aliphatic carbon atom in ring).
Lower case r specifies the size of the smallest ring of which the atom is a member. The carbon atoms of the ring fusion would both match [cr5] . Bonds can be specified as cyclic, for example C@C matches directly bonded atoms in a ring.
Four logical operators allow atom and bond descriptors to be combined. The 'and' operator ';' can be used to define a protonated primary amine as [N;H3;+] [C;X4] . The 'or' operator ',' has a higher priority and [c,n;H] defines (aromatic carbon or aromatic nitrogen) with implicit hydrogen. The 'and' operator '&' is has higher priority than ',' and [c,n&H] defines aromatic carbon or (aromatic nitrogen with implicit hydrogen).
The 'not' operator '!' can be used to define unsaturated aliphatic carbon as [C;!X4] and acyclic bonds as *-!@*.
Recursive SMARTS allow detailed specification of an atom's environment. For example the more reactive (with respect to
electrophilic aromatic substitution) ortho and para carbon atoms of phenolcan be defined as: [$(c1c( [OH] )cccc1),$(c1ccc( [OH] )cc1)]
Examples of SMARTS
A number of illustrative examples of SMARTS have been assembled by Daylight.
The definitions of hydrogen bond donors and acceptors used to apply
Lipinski's Rule of Five. [Lipinski, C.A.; Lombardo, F.; Dominy, B.W.; Feeney, P.J. (1997) [http://dx.doi.org/10.1016/S0169-409X(00)00129-0 Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings] "Adv. Drug Del. Rev.", 46, 3-26.] are easily coded in SMARTS. Donors are defined as nitrogen or oxygen atoms that have at least one directly bonded hydrogen atom:
[N,n,O;!H0] or [#7,#8;!H0] (aromatic oxygen cannot have a bonded hydrogen)
Acceptors are defined as nitrogen or oxygen:
[N,n,O,o] or [#7,#8]
A simple definition of
aliphatic aminesthat are likely to protonate at physiological pHcan be written as the following recursive SMARTS:
[$( [NH2] [CX4] ),$( [NH] ( [CX4] ) [CX4] ),$ [NX3] ( [CX4] )( [CX4] ) [CX4] )]
In real applications the CX4 atoms would need to be defined more precisely to prevent matching against electron withdrawing groups such as CF3 that would render the amine insufficiently basic to protonate at physiological
SMARTS can be used to encode
pharmacophoreelements such as anionic centers. In the following example, recursive SMARTS notation is used to combine acid oxygen and tetrazole nitrogen in a definition of oxygen atoms that are likely to be anionic under normal physiological conditions.
[$( [OH] [C,S,P] =O),$( [nH] 1nnnc1)]
The SMARTS above would only match the acid hydroxyl and the tetrazole NH. When a
carboxylic aciddeprotonates the negative charge is delocalised over both oxygen atoms and it may be desirable to designate both as anionic. This can achieved using the following SMARTS.
[$( [OH] )C=O),$(O=C [OH] )]
Applications of SMARTS
The precise and transparent substructural specification that SMARTS allows has been exploited in a number of applications.
Substructural filters defined in SMARTS have been used [Hann,M.; Hudson,B.; Lewell, X.; Lifely, R.; Miller, L.; Ramsden, N. (1999) [http://dx.doi.org/10.1021/ci990423o Strategic Pooling of Compounds for High-Throughput Screening] "J. Chem. Inf. Comput. Sci." 39, 897-902] to indentify undesirable compounds when performing strategic pooling of compounds for high-throughput screening. The REOS (rapid elimination of swill) [Walters, W.P.; Murcko, M.A. (2002) [http://dx.doi.org/10.1016/S0169-409X(02)00003-0 Prediction of drug-likeness] "Adv. Drug. Deliv. Rev." 54, 255-271] procedure uses SMARTS to filter out reactive, toxic and otherwise undesirable moieties from databases of chemical structures.
RECAP [Lewell, X.Q.; Judd, D.B; Watson, S.P.; Hann, M.M. (1998) [http://dx.doi.org/10.1021/ci970429i RECAP-Retrosynthetic Combinatorial Analysis Procedure: A Powerful New Technique for Identifying Privileged Molecular Fragments with Useful Applications in Combinatorial Chemistry] "J. Chem. Inf. Comput. Sci." 38, 511-522.] (Retrosynthetic Combinatorial Analysis Procedure) uses SMARTS to define bond types. RECAP is a
molecule editorwhich generates fragments of structures by breaking bonds of defined types and the original link points in these are specified using isotopic labels. Searching databases of biologically active compounds for occurrences of fragments allows privileged structural motifs to be identified. The Molecular Slicer [ Vieth, M.; Siegel, M.G.; Higgs, R.E.; Watson, I.A.; Robertson, D.H.; Savin, K.A.; Durst, G.L.; Hipskind; P.A. (2004) [http://dx.doi.org/10.1021/jm030267j Characteristic Physical Properties and Structural Fragments of Marketed Oral Drugs] "J. Med. Chem." 47, 224-232.] is similar to RECAP and has been used to identify fragments that are commonly found in marketed oral drugs.
The Leatherface program [Kenny, P.W.; Sadowski, J. (2005) [http://dx.doi.org/10.1002/3527603743.ch11 Structure modification in chemical databases] "Chemoinformatics in drug discovery", 271-285.] is a general purpose
molecule editorwhich allows automated modification of a number of substructural features of molecules in databases, including protonation state, hydrogen count, formal charge, isotopic weight and bond order. The molecular editing rules used by Leatherface are defined in SMARTS. Leatherface can be used to standardise tautomeric and ionization states and to set and enumerate these in preparation of databases [Lyne, P.D. Kenny, P.W.; Cosgrove, D.A.; Deng, C.; Zabludoff, S.; Wendoloski, J.J.; Ashwell, S. (2004) [http://dx.doi.org/10.1021/jm030504i Identification of Compounds with Nanomolar Binding Affinity for Checkpoint Kinase-1 Using Knowledge-Based Virtual Screening] "J. Med. Chem." 47, 1962-1968.] for virtual screening. Leatherface has been used in Matched Molecular Pair Analysis, which enables the effects of structural changes (e.g substitution of hydrogen with chlorine) to be quantified, [Leach, A.G.; Jones, H.D.; Cosgrove, D.A.; Kenny, P.W.; Ruston, L.; MacFaul, P.; Wood, J.M.; Colclough, N.; Law, B. (2006) [http://dx.doi.org/10.1021/jm0605233 Matched Molecular Pairs as a Guide in the Optimization of Pharmaceutical Properties; a Study of Aqueous Solubility, Plasma Protein Binding and Oral Exposure] "J. Med. Chem." 49 , 6672-6682.] over a range of structural types.
ALADDIN, [Van Drie, J.H.; Weininger, D.; Martin; Y.C. (1989) [http://dx.doi.org/10.1007/BF01533070 ALADDIN: An integrated tool for computer-assisted molecular design and pharmacophore recognition from geometric, steric, and substructure searching of three-dimensional molecular structures] "J. Comput.-Aid. Mol. Des". 3 225-251.] is a
pharmacophorematching program that uses SMARTS to define recognition points (e.g. neutral hydrogen bondacceptor) of pharmacophores. A key problem in pharmacophore matching is that functional groups that are likely to be ionised at physiological pHare typically registered in their neutral forms in structural databases. The ROCS shape matching program allows atom types to be defined using SMARTS. [ [http://www.eyesopen.com/products/applications/rocs.html OpenEye Scientific Software | ROCS ] ]
Wikimedia Foundation. 2010.
Look at other dictionaries:
SMILES — Información general Extensión de archivo .smi Tipo de formato formato de archivo de química Formato abierto ? En química … Wikipedia Español
Simplified molecular input line entry specification — Infobox file format name = smiles extension = .smi mime = owner = creatorcode = genre = chemical file format container for = contained by = extended from = extended to = The simplified molecular input line entry specification or SMILES is a… … Wikipedia
SYBYL Line Notation — Infobox file format name = sybyl line notation extension = .sln mime = owner = creatorcode = genre = chemical file format container for = contained by = extended from = extended to = The SYBYL line notation or SLN is a specification for… … Wikipedia
Аббревиатура — У этого термина существуют и другие значения, см. Аббревиатура (значения). Аббревиатура (итал. abbreviatura от лат. brevis краткий) или сокращение. В старинных рукописях и книгах сокращённое написание слова или группы слов,… … Википедия
Хемоинформатика — Хемоинформатика применение методов информатики для решения химических проблем. Сферы приложения хемоинформатики: прогноз физико химических свойств молекул (в частности, липофильности, водорастворимости), свойств материалов,… … Википедия
Smarts — may be:* System Management Arts, a maker of network management software, now the Smarts division of EMC Corporation * Smiles ARbitrary Target Specification, a query extension of SMILES (Simplified molecular line input system) notation for… … Wikipedia
Аббревиатуры — Аббревиатура (итал. abbreviatura от лат. brevis краткий). В старинных рукописях и книгах сокращённое написание слова или группы слов. В современных изданиях любое сокращённое слово или словосочетание. Содержание 1 Разновидности 1.1 Инициальная … Википедия
XForms — is an XML format for the specification of a data processing model for XML data and user interface(s) for the XML data, such as web forms. XForms was designed to be the next generation of HTML / XHTML forms, but is generic enough that it can also… … Wikipedia