Chapter 2. Using Mychem

Table of Contents

2.1. The Database
2.1.1. Database Creation with SQL and Python Scripts
2.1.2. Database creation with mychemdb-manager.py
2.2. Examples
2.2.1. Calculate the Molecular Weight
2.2.2. Search a Substructure

This chapter provides a short tutorial on the usage of Mychem. It will present you a simple way to create a chemical database with Mychem and how to use some functions. More details about each function used in this tutorial are available in the Chapter 3, Command Reference.

2.1. The Database

A chemical database is composed of one or several tables. The following example use a set of four tables, however the use of one table can be sufficient when working with small databases.

  • compounds - a table containing an unique id for each molecule and its name.

  • 1D_structures - a table containing an unique reference to the compounds table and several types of 1D molecular descriptors (InChI code and SMILES).

  • 3D_structures - a table containing an unique reference to the compounds table and the 3D structure in MDL Molfile format.

  • bin_structures - a table containing an unique reference to the compounds table and several types of binary descriptors (fingerprints and serialized OBMol object).

This section describes two ways to create and fill these tables. It can be done manually by using SQL and Python scripts, or by using the mychemdb-manager.py program provided with Mychem.

2.1.1. Database Creation with SQL and Python Scripts

2.1.1.1. Table Set Creation

The structure of the database can be created with the following SQL code:

--
-- The 'mychem' database name is provided as example. You can
-- choose any other existing database.
--

CREATE DATABASE `mychem`;
USE mychem;


--
-- The `compounds` table is used to store an unique id for each
-- compound and its name.
--

CREATE TABLE IF NOT EXISTS `compounds` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(255) CHARSET utf8 COLLATE utf8_bin NOT NULL,
  `created` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `modified` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  PRIMARY KEY id (`id`),
  KEY name (`name`) )
  ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin
  COMMENT='Compound Library';


--
-- The `1D_structures` table is used to store InChI codes and SMILES
-- strings. This table can be used for exact or substructure match
-- queries.
--

CREATE TABLE IF NOT EXISTS `1D_structures` (
  `compound_id` int(11) unsigned NOT NULL,
  `inchi` text NOT NULL,
  `smiles` text NOT NULL,
  PRIMARY KEY compound_id (`compound_id`) )
  ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin
  COMMENT='Compound 1D Structures';


--
-- The `3D_structures` table is used to store MDL Molfiles.
--

CREATE TABLE IF NOT EXISTS `3D_structures` (
  `compound_id` int(11) unsigned NOT NULL,
  `molfile` text NOT NULL,
  PRIMARY KEY compound_id (`compound_id`) )
  ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin
  COMMENT='Compound 3D Structures';


--
-- The `bin_structures` table is used to store binary descriptors.
-- This table can be used for similarity and substructure queries.
--

CREATE TABLE IF NOT EXISTS `bin_structures` (
  `compound_id` int(11) unsigned NOT NULL,
  `fp2` blob NULL,
  `obserialized` blob NULL,
  PRIMARY KEY compound_id (`compound_id`) )
  ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin
  COMMENT='Compound Binary Structures';

2.1.1.2. Inserting Data

The molecular structures contained in a MDL SDF file can be easily loaded into the database by using a Python or a Perl script. Such a script is listed in Appendix C, Loading MDL SDFFile with Python. Once the compounds and 3D_structures tables are filled, you can populate the two remaining tables. To insert data into the 1D_structures and bin_structures tables, use the following SQL code:

--
-- Database: mychem
--

USE mychem;


--
-- Insertion of data into the `1D_structures` table
--

INSERT INTO `1D_structures` (`compound_id`,`inchi`,`smiles`)
SELECT `compound_id`, MOLECULE_TO_INCHI(`molfile`),
MOLECULE_TO_SMILES(`molfile`) FROM `3D_structures`;


--
-- Insertion of data into the `bin_structures` table
--

INSERT INTO `bin_structures` (`compound_id`,`fp2`,`obserialized`)
SELECT `compound_id`, FINGERPRINT2(`molfile`),
MOLECULE_TO_SERIALIZEDOBMOL(`molfile`) FROM `3D_structures`;

The database is now fully usable. Additional tables can also be created. For example, a table storing chemical properties (molecular weight, LogP, ...) can be interesting.

2.1.2. Database creation with mychemdb-manager.py

All tasks described in the previous section can also be performed with the mychemdb-manager.py script, a Python program provided with Mychem. It can be found in the scripts directory. This program aims to provide a convenient interface for loading and managing chemical databases in MySQL. A complete overview of this script is detailed in Appendix D, Chemical Database Manager.