Description Simplified molecular-input line-entry system
1 description
1.1 atoms
1.2 bonds
1.3 rings
1.4 aromaticity
1.5 branching
1.6 stereochemistry
1.7 isotopes
1.8 examples
1.9 other examples of smiles
description
atoms
atoms represented standard abbreviation of chemical elements, in square brackets, such [au] gold. brackets may omitted in common case of atoms which:
all other elements must enclosed in brackets, , have charges , hydrogens shown explicitly. instance, smiles water may written either o or [oh2]. hydrogen may written separate atom; water may written [h]o[h].
when brackets used, symbol h added if atom in brackets bonded 1 or more hydrogen, followed number of hydrogen atoms if greater 1, sign + positive charge or - negative charge. example, [nh4+] ammonium. if there more 1 charge, written digit; however, possible repeat sign many times ion has charges: 1 may write either [ti+4] or [ti++++] titanium iv (ti). thus, hydroxide anion represented [oh-], hydronium cation [oh3+] , cobalt iii cation (co) either [co+3] or [co+++].
bonds
a bond represented using 1 of symbols . - = # $ : / or \ .
bonds between aliphatic atoms assumed single unless specified otherwise , implied adjacency in smiles string. although single bonds may written - , omitted. example, smiles ethanol may written c-c-o, cc-o or c-co, written cco.
double, triple, , quadruple bonds represented symbols = , # , , $ respectively illustrated smiles o=c=o (carbon dioxide), c#n (hydrogen cyanide) , [ga-]$[as+] (gallium arsenide).
an additional type of bond non-bond , indicated . , indicate 2 parts not bonded together. example, aqueous sodium chloride may written [na+].[cl-] show dissociation.
an aromatic 1 , half bond may indicated : ; see § aromaticity below.
single bonds adjacent double bonds may represented using / or \ indicate stereochemical configuration; see § stereochemistry below.
rings
ring structures written breaking each ring @ arbitrary point (although choices lead more legible smiles others) make acyclic structure , adding numerical ring closure labels show connectivity between non-adjacent atoms.
for example, cyclohexane , dioxane may written c1ccccc1 , o1ccocc1 respectively. second ring, label 2. example, decalin (decahydronaphthalene) may written c1cccc2c1cccc2.
smiles not require ring numbers used in particular order, , permits ring number zero, although used. also, permitted re-use ring numbers after first ring has closed, although makes formulae harder read. example, bicyclohexyl written c1ccccc1c2ccccc2, may written c0ccccc0c0ccccc0.
multiple digits after single atom indicate multiple ring-closing bonds. example, alternative smiles notation decalin c1cccc2ccccc12, final carbon participates in both ring-closing bonds 1 , 2. if two-digit ring numbers required, label preceded by %, c%12 single ring-closing bond, of ring 12.
ring-closing digits may preceded bond type. example, cyclopropene written c1=cc1, if double bond chosen ring-closing bond, may written c=1cc1, c1cc=1, or c=1cc=1. (the first form preferred.) c=1cc-1 illegal, explicitly specifies conflicting types ring-closing bond.
ring-closing bonds may not used denote multiple bonds. example, c1c1 not valid alternative c=c ethylene. however, may used non-bonds; c1.c2.c12 peculiar legal alternative way write propane, more commonly written ccc.
choosing ring-break point adjacent attached groups can lead simpler smiles form avoiding branches. example, cyclohexane-1,2-diol written oc1ccccc1o; choosing different ring-break location produces branched structure requires parentheses write.
aromaticity
aromatic rings such benzene may written in 1 of 3 forms:
in latter case, bonds between 2 aromatic atoms assumed (if not explicitly shown) aromatic bonds. thus, benzene, pyridine , furan can represented respectively smiles c1ccccc1, n1ccccc1 , o1cccc1.
aromatic nitrogen bonded hydrogen, found in pyrrole must represented [nh] , imidazole written in smiles notation n1c[nh]cc1.
when aromatic atoms singly bonded each other, such in biphenyl, single bond must shown explicitly: c1ccccc1-c2ccccc2. 1 of few cases single bond symbol - required. (in fact, smiles software can correctly infer bond between 2 rings cannot aromatic , accept form c1ccccc1c2ccccc2 .)
the daylight , openeye algorithms generating canonical smiles differ in treatment of aromaticity.
visualization of 3-cyanoanisole coc(c1)cccc1c#n.
branching
branches described parentheses, in ccc(=o)o propionic acid , fc(f)f fluoroform. first atom within parentheses, , first atom after parenthesized group, both bonded same branch point atom.
substituted rings can written branching point in ring illustrated smiles coc(c1)cccc1c#n (see depiction) , coc(cc1)ccc1c#n (see depiction) encode 3 , 4-cyanoanisole isomers. writing smiles substituted rings in way can make them more human-readable.
branches may written in order. example, bromochlorodifluoromethane may written fc(br)(cl)f, brc(f)(f)cl, c(f)(cl)(f)br, or like. generally, smiles form easiest read if simpler branch comes first, final, unparenthesized portion being complex. caveats such rearrangements are:
if ring numbers reused, paired according order of appearance in smiles string. adjustments may required preserve correct pairing.
if stereochemistry specified, adjustments must made; see stereochemistry § notes below.
the 1 form of branch not require parentheses ring-closing bonds. choosing ring-closing bonds appropriately can reduce number of parentheses required. example, toluene written cc1ccccc1 or c1ccccc1c, avoiding parentheses required if written c1ccc(c)ccc1 or c1ccc(ccc1)c.
stereochemistry
trans-1,2-difluoroethylene
smiles permits, not require, specification of stereoisomers.
configuration around double bonds specified using characters / , \ show directional single bonds adjacent double bond. example, f/c=c/f (see depiction) 1 representation of trans-1,2-difluoroethylene, in fluorine atoms on opposite sides of double bond, whereas f/c=c\f (see depiction) 1 possible representation of cis-1,2-difluoroethylene, in fs on same side of double bond, shown in figure.
bond direction symbols come in groups of @ least two, of first arbitrary. is, f\c=c\f same f/c=c/f. when alternating single-double bonds present, groups larger two, middle directional symbols being adjacent 2 double bonds. example, common form of (2,4)-hexadiene written c/c=c/c=c/c.
beta-carotene, eleven double bonds highlighted.
as more complex example, beta-carotene has long backbone of alternating single , double bonds, may written cc1ccc/c(c)=c1/c=c/c(c)=c/c=c/c(c)=c/c=c/c=c(c)/c=c/c=c(c)/c=c/c2=c(c)/cccc2(c)c.
configuration @ tetrahedral carbon specified @ or @@. consider 4 bonds in order in appear, left right, in smiles form. looking toward central carbon perspective of first bond, other 3 either clockwise or counter-clockwise. these cases indicated @@ , @, respectively. (because @ symbol counter-clockwise spiral.)
l-alanine
for example, consider amino acid alanine. 1 of smiles forms nc(c)c(=o)o, more written n[ch](c)c(=o)o. l-alanine, more common enantiomer, written n[c@@h](c)c(=o)o (see depiction). looking n-c bond, hydrogen (h), methyl (c), , carboxylate (c(=o)o) groups appear clockwise. d-alanine can written n[c@h](c)c(=o)o (see depiction).
while order branches specified in smiles unimportant, in case matters; swapping 2 groups requires reversing chirality indicator. if branches reversed alanine written nc(c(=o)o)c, configuration reverses; l-alanine written n[c@h](c(=o)o)c (see depiction). other ways of writing include c[c@h](n)c(=o)o, oc(=o)[c@@h](n)c , oc(=o)[c@h](c)n.
normally, first of 4 bonds appears left of carbon atom, if smiles written beginning chiral carbon, such c(c)(n)c(=o)o, 4 right, first appear (the [ch] bond in case) used reference order following three: l-alanine may written [c@@h](c)(n)c(=o)o.
the smiles specification includes elaborations on @ symbol indicate stereochemistry around more complex chiral centers, such trigonal bipyramidal molecular geometry.
isotopes
isotopes specified number equal integer isotopic mass preceding atomic symbol. benzene in 1 atom carbon-14 written [14c]1ccccc1 , deuterochloroform [2h]c(cl)(cl)cl.
examples
to illustrate molecule more 9 rings, consider cephalostatin-1, steroidic trisdecacyclic pyrazine empirical formula c54h74n2o10 isolated indian ocean hemichordate cephalodiscus gilchristi:
starting left-most methyl group in figure:
cc(c)(o1)c[c@@h](o)[c@@]1(o2)[c@@h](c)[c@@h]3cc=c4[c@]3(c2)c(=o)c[c@h]5[c@h]4cc[c@@h](c6)[c@]5(c)cc(n7)c6nc(c[c@@]89(c))c7c[c@@h]8cc[c@@h]%10[c@@h]9c[c@@h](o)[c@@]%11(c)c%10=c[c@h](o%12)[c@]%11(o)[c@h](c)[c@]%12(o%13)[c@h](o)c[c@@]%13(c)co
note % appears in front of index of ring closure labels above 9; see § rings above.
other examples of smiles
the smiles notation described extensively in smiles theory manual provided daylight chemical information systems , number of illustrative examples presented. daylight s depict utility provides users means check own examples of smiles , valuable educational tool.
Comments
Post a Comment