The Cambridge Structural Database System

The distributed CSD system comprises the CSD itself together with six major software components. One of these is PreQuest, the software used by the CCDC's scientific editors to create value-added and fully checked CSD entries. This software is released so that users of the CSD system can create their own in-house databases of proprietary structures in CSD format. These private databases can be searched separately or together with the main CSD. The other five software components fall into two categories: (1) software for CSD access, structure visualization, and data analysis, and (2) knowledge-based libraries that provide click-of-a-button access to many millions of geometrical data items derived principally from the CSD, but incorporating some information from the PDB6 as well.

3.18.3.1 Searching, Visualizing, and Analyzing Cambridge Structural Database Information

3.18.3.1.1 The ConQuest search program

ConQuest3 provides search, retrieval, and display facilities for the CSD. Individual queries can be entered to interrogate the bibliographic, chemical text and numerical fields listed in Table 1. Most importantly (Figure 3), ConQuest provides extensive graphical facilities for defining 2D and 3D substructure searches. The 2D searches interrogate the chemical connection tables alone, while the internal mapping of atomic coordinates and connection tables forms the basis for:

• Systematic 3D substructure searching. This can be (1) intramolecular, for example to locate 3D pharmacophoric patterns or to retrieve substructures having specific conformations by use of torsion angle constraints, or (2) intermolecular, where searching is applied to extended crystal structures, for example to locate hydrogen bonds or other non-bonded interactions, again using appropriate chemical and geometrical constraints.

• The retrieval of calculated 3D geometrical parameters for each occurrence of the defined substructure, as shown in Figure 3. These data can then be used in further analyses (e.g., using the Vista program described below).

ConQuest has facilities for combining individual queries, including 2D and 3D substructure queries, using Boolean logic. The program also displays the information content of CSD entries, selected from either the main database, or from the subset of entries resulting from a search. Display panes show bibliographic and chemical text, crystal data, 2D

Cent-C-C=O torsion (where 'cent' is the centroid of the cyclopropane ring)

Figure 3 (a) ConQuest query of the cent-C-C = O torsion angle of cyclopropyl-carbonyl and (b) the corresponding Vista-generated histogram of the torsion angle (cent is the centroid of the cyclopropyl ring).

Figure 3 (a) ConQuest query of the cent-C-C = O torsion angle of cyclopropyl-carbonyl and (b) the corresponding Vista-generated histogram of the torsion angle (cent is the centroid of the cyclopropyl ring).

chemical diagrams, and 3D molecular or crystal structures. ConQuest can access Mercury or Vista directly (see below), to provide more extensive 3D structure visualization or data analysis facilities. ConQuest can also output information for search hits in a variety of formats (e.g. cif and mol2), and transfer data to other programs. Currently (2005), ConQuest is being upgraded to provide links from CSD entries to the electronic literature.

3.18.3.1.2 The mercury visualizer

Mercury3 provides general and advanced functionality for viewing 3D molecular and crystal structures, as summarized in Table 3. A unique feature of Mercury applied to CSD entries is its ability to import chemical bond types from the 2D connection tables and display them on 3D images, as shown in Figure 4a.

However, the most important functionality in Mercury, and one which is vital in supramolecular studies, is the ability to locate, build, and display networks of intermolecular and intramolecular hydrogen bonds, short nonbonded contacts, and user-specified contact types. Mercury will use distance criteria relative to van der Waals radii sums, or direct (angstrom) values. An example hydrogen-bonded network, constructed and viewed in Mercury, is shown in Figure 4b. The facilities for displaying a slice through a crystal in any direction are illustrated in Figure 4c, and such displays can be valuable in rationalizing crystal morphology and predicting how to control it. Finally, Mercury has the ability to read several structures into the same visualization window and manipulate and overlay them separately. This is especially valuable when comparing conformations, examining differences between polymorphic forms, etc.

3.18.3.1.3 Data analysis using Vista

Vista displays molecular geometry and other parameters relating to a molecular or supramolecular substructure in a spreadsheet format. These data are normally retrieved from the CSD using ConQuest according to user-supplied specifications (e.g. see Figure 3). Vista performs a variety of analysis and display functions, including generation of:

• histograms and scattergrams of parameter distributions referred to Cartesian or polar axes;

• simple descriptive statistics for parameter distributions;

• statistical analyses, including linear regression and principal-component analysis;

• hyperlinking from spreadsheet data back to the original CSD entry; and

• preparation of plots for reports and publications.

3.18.3.2 Knowledge-Based Libraries of Structural Information

The software facilities described in Section 3.18.3.1 permit CSD information to be searched and analyzed in a very comprehensive manner. However, while efficient in themselves, they can be time-consuming for scientists who wish to access standard geometrical data, such as the mean value of a particular type of bond, the conformational preferences exhibited by a specific substructure, or the spatial characteristics and metrics of a common hydrogen bond. For this reason, the CCDC has compiled two structural knowledge bases designed to provide instant click-of-a-button access to a very wide range of information on geometrical structure, both intramolecular, via Mogul,4 and intermolecular, via IsoStar.5

Table 3 Principal facilities of the Mercury visualizer

• Browse the entire CSD, load hit lists from ConQuest searches, or read in crystal structure data in other common formats (mol2, pdb, cif, mol)

• Rotate, translate and scale the 3D crystal structure display and view down cell axes, reciprocal cell axes, and normals to planes

• Range of visualization options (different display styles, coloring and labeling options, ability to hide and then redisplay atoms and molecules, etc.)

• Measure distances, angles, and torsion angles

• Create and display centroids, least-squares mean planes, and Miller planes

• Display anisotropic displacement parameters as ellipsoids

• Display unit cell axes and the contents of any number of unit cells in any direction (including fractions of unit cells)

• Locate, display, and build networks of hydrogen bonds or other nonbonded contacts

• Display a slice through the crystal in any direction

• View, and superpose, two or more structures in the same window

• Display CSD entry information, including 2D chemical diagrams

• Save images in a variety of formats v n

Figure 4 Mercury plots of the CSD entry LECNEM (an amlnomethylpyrldlne-chloro-dlmethylsulfoxlde-palladlum complex thought to have antltumor propertles) showlng (a) the 3D structure, (b) short-range lnteractlons (wlthln the range of van der Waals radll), and (c) a sllce through the unlt cell.

3.18.3.2.1 Mogul: a knowledge base of intramolecular geometry

Mogul4 operates by searching precomputed libraries of bond lengths, valence angles, and torsion angles derived from the CSD. The libraries are built in the following manner: (1) bond, angle, and torsional fragments (acyclic torsions only, at present) are constructed from every entry in the CSD; (2) these fragments are then classified by the evaluation of their 'key components' (i.e., their atom-based and bond-based properties); and (3) the fragments are then grouped on the basis of these components, together with the value of the geometric feature. This process generates a search tree, which enables very fast searching based only on the 'key components' of a query fragment,4 thus obviating the need for the computer-intensive atom-by-atom, bond-by-bond searching inherent in normal substructure searching methodologies.

To view data concerning a bond, angle, or torsion angle, a query molecule (2D or 3D) is either imported into the Mogul interface (from the CSD or using a variety of input formats (mol2, cif, res, pdb, etc.)) or sketched. The geometric feature of interest is selected by the user (Figure 5a), the software computes the 'key components' of the chemical fragment, and the search is carried out. A histogram of the required distribution is obtained in seconds, together with descriptive statistics (Figure 5b). Occasionally, a histogram may contain rather few hits due to the low frequency of occurrence of the search fragment environment in the CSD. Although these hits are exact with respect to the query fragment, there are insufficient entries in the histogram for it to be useful. In these situations, it is possible to generalize the search (i.e., relax the level of fragment environment specification), so that additional chemically related hits are obtained. The results of a generalized search can be added to the histogram obtained from the exact search, and all hits are listed by their similarity/relevance to the query fragment. The user may choose the level of fragment similarity to be included in the final distribution. Complete geometry searches (i.e., to locate CSD distributions for all bonds, angles, and torsions in the target molecule) may also be carried out.

It is possible to hyperlink from bar(s) in the Mogul histograms back to the CSD entries that generated those values; individual 2D and 3D structures can be viewed, together with data such as the CSD reference code, publication details,

CCDC Mogul 1.0.1: C:\Documents and S8ttings\rob9rtson\Desktop\Bawrew.cif

File Searches Help

Build query Results and analysis | View structures |

CCDC Mogul 1.0.1: C:\Documents and S8ttings\rob9rtson\Desktop\Bawrew.cif

File Searches Help

Build query Results and analysis | View structures |

(q) Select bond length, bond angle or torsion by picking al

Figure 5 Results of a Mogul search for a C-S-S-C torsion. (a) The search was carried out by selecting the relevant atoms that make up the torsion in the query molecule. (b) The results are presented in a distribution histogram, illustrating that the preferred C-S-S-C torsion is around 90°.

Figure 5 Results of a Mogul search for a C-S-S-C torsion. (a) The search was carried out by selecting the relevant atoms that make up the torsion in the query molecule. (b) The results are presented in a distribution histogram, illustrating that the preferred C-S-S-C torsion is around 90°.

Figure 6 IsoStar scatterplots of a charged carboxylate central group and an OH contact group taken from (a) the CSD and (b) the PDB. (c) A CSD plot contoured on the donor oxygen atoms.

compound name, formula, and R-factor. The hyperlinking feature is particularly useful in examining outliers, and in determining any chemical effects that may be responsible for any multimodality of, especially, torsion angle distributions. Data can also be retrieved from the precomputed libraries via an instruction file interface, and this facility makes it possible to integrate Mogul quite readily with third-party software, as discussed in Section 3.18.4.3.

3.18.3.2.2 IsoStar: a knowledge base of intermolecular interactions

IsoStar5 gathers together a vast amount of information on intermolecular interactions in a readily accessible form. For a given contact between a central group (A) and a contact group (B), the CSD search results for an interaction A?B are transformed into an easily visualized form by overlaying the A moieties. This results in a 3D scatterplot (Figure 6a and b), showing the experimental distribution of the B moieties around the static central group (A); these scatterplots can also be presented in contoured form (Figure 6c). IsoStar contains data retrieved from the CSD and from protein-ligand complexes stored in the PDB. IsoStar also contains over 1500 potential energy minima calculated using distributed multipole analysis and intermolecular perturbation theory.10

Version 1.7 of IsoStar, released in December 2004, covers 300 central groups and 48 contact groups, and contains around 20 000 CSD-based scatterplots and 5500 PDB-based scatterplots. The user may interact with the basic scatterplots to generate contoured surfaces, change the A?B distance limit for data presentation, control the display style, etc. As with Mogul, the scatterplot data are hyperlinked to the master CSD and PDB files, so that the structural origin and environment of a specific interaction can be examined in detail. As with the hyperlinking feature in Mogul, hyperlinking in IsoStar can be used to investigate outliers in any scatterplot, and to investigate the full chemical nature of the contacting atoms. IsoStar therefore contains a vast amount of information of use in supramolecular chemistry, crystal engineering, and organic crystal chemistry, and also provides ready access to information that is invaluable in studying protein-ligand interactions as part of the rational drug design process.

0 0

Post a comment

  • Receive news updates via email from this site