MolPort Available Compound Database


The MolPort database contains data and prices for over 7 million purchasable chemical compounds available from stock and for over 20 million made-to-order compounds. You can find these compounds and view their commercial data on our e-shop www.molport.com. However, certain types of data analysis (filtering by specific parameters, diversity selection, virtual screening or docking, etc.) require you to have this data locally. We are making our chemical structure database available for you to download. Access to the MolPort database is free of charge. Extended data access requires the user to have a free registered account.

You can access data from multiple suppliers in a standardized form. We provide you access to downloading files using FTP or HTTP protocols (i.e. via web browser). The Data is available in two common standards: SMILES and SD files. Additional properties are stored along with chemical data in a separate properties file. Our incremental monthly updates for these files will allow you to keep the received data current without the need to download the full database.

Extended data, such as available amount, may change quickly and the current value can be checked using the MolPort web services (API). With web services, you can instantly get up-to-date information for compounds from a software of your choice: your in-house application, Microsoft Excel, Pipeline Pilot or KNIME nodes. You can use API even without downloading full files, since you can search based on your provided SMILES.

The data can be accessed in several ways. Overview of the available data and data access options:

Open access FTP files (SDF/ SMILES) FTP SDF (2D) files FTP SMILES files FTP Properties file FTP Monthly update files (SDF/ SMILES) HTTP download (SDF/SMILES) Chemical Search API (also via Excel KNIME, Pipeline Pilot) Full molecule Load API
Data update frequency Monthly Monthly Monthly Monthly Monthly Monthly Daily Daily
Structure Y Y Y N Y Y Y Y
MolPort ID Y Y Y Y Y Y Y Y
Maximum stock amount verified with supplier N Y N Y Y Y/N Y N
Maximum unverified amount (as package size) N Y N Y Y Y/N Y N
Is available as Screening Compound? N Y N Y Y Y/N N N
Is Building Block? N Y N Y Y Y/N N N
Best lead time N Y Y Y Y Y N N
Price range for 1mg N Y Y Y Y Y N N
Price range for 5mg N Y Y Y Y Y N N
Price range for 50mg N Y Y Y Y Y N N
QC methods N Y N Y Y Y/N N N
Compound state N Y N Y Y Y/N N N
InChI N N Y Y N Y/N N N
InChI Key N N Y Y N Y/N N N
IUPAC name N N N Y N N N Y
                 
Direct link to molecular page Y Y N Y Y Y/N Y Y
Download size (compact archive) 2Gb/100Mb ~2Gb ~500Mb ~500Mb ~30-50Mb 2Gb/500Mb 0 0
Full file size 20Gb/1Gb ~20Gb ~2Gb ~2Gb 300-500Mb 20Gb/2Gb 0 0
Number of files 2 (1 per type) 15 15 1 3* 2 (1 per type) 0 0
                 
Suppliers N N N N N N N Y
Catalog numbers N N N N N N N Y
Prices N N N N N N N Y
Pack sizes N N N N N N N Y
Shipping costs N N N N N N N Y

*Check Monthly update file section



Data Formats:


SDF

SDF stands for “structure-data file”. This is a text file with a predefined format to store chemical data. Historically it is the most popular standard to store 2D structures of molecules by providing each atom with coordinates and a connectivity table for atom bonds. Each molecule has its properties stored after the molecule block. Most chemical software can use SDF files directly or convert them for internal representation using a build in import process.


SMILES

SMILES is an acronym for 'Simplified Molecular Input Line Entry Specification'. This format includes atom types and connectivity, however the 2D structure is generated via software on the fly, when it is needed. It is much more compact and better suited to be stored in standard databases, text fields or spreadsheets, since each structure is a single compact text string. Most modern chemical software can use this format as well. The same structure can be written down with SMILES in diverse ways, depending on the starting atom. To detect identical molecules, a set of rules defined by algorithms is applied to standardize the representation and obtain the so-called Canonical SMILES. Obtaining canonical SMILES for some structures may fail. It must be taken into consideration, that these rules can differ between software packages. Our SMILES files are tab separated text column files. They contain structures in the SMILES format (may contain ChemAxon SMILES extensions, when needed), Canonical SMILES (created using the Chemistry Development Kit), MolPort IDs and other properties. The first line contains column headers. Should your chemical data software not allow importing or opening SMILES in text files, you may need to change file extension to “.smi” or “.smiles”. Files can be opened with a basic text editor or in Microsoft Excel for review as well.

Here you can find more about OpenSMILES Specification.


Properties File

A properties file is a compressed, tab separated text file with MolPort IDs and additional molecule properties listed in the table available for FTP download. You can use this file to append information contained in Open Access Files or SMILES files.



Access to MolPort Data:

Open Access Files

These files allow downloading information on all stock compounds available on MolPort.com from an FTP server with standard credentials (does not require a registered account).
        FTP: ftp://molport.com/
        User name: MolPortUser
        Password: MolPortUser

Two files, named MolPortAllStockCompounds (SMILES and SDF), contain information on all stock compounds. In these files you can review structures, MolPort IDs and find a direct link to the dedicated molecule page on molport.com, where additional information about suppliers, prices, purity, delivery time, shipping costs, etc. are available. These files are updated monthly.



Standard FTP Download

An extended version of the files described above is available for users, who prefer to work with advanced data. These files can be access for free, after acquiring a login and password. Please fill out this form to receive your credentials. You will receive instructions for downloading the files with our response. Note: Only corporate/university email address users can receive valid credentials. Files on our FTP server are stored in folders created monthly and named correspondingly, for example “2018-04”. Any chosen subset (see below section [link to: Data subsets for download]) will contain a list of compressed files, where each file will contain 500 000 compounds in an SDF or SMILES format with the associated data. Older folders contain information necessary for updating data to the current version (see below).



Monthly Update File

Each month we create a special folder – “Changed Since Previous Update”, which contains data on added compounds (SDF and SMILES files) and removed compounds (plain text files with MolPort IDs). The folder “Amount Data” contains tab separated text files with MolPort IDs and stock amount. This allows you to update the previously acquired information.



HTTP Downloads

In case FTP access is not the ideal option for you, consider accessing the data via HTTP. Files obtained with this protocol are duplicates of FTP downloads with one difference - each data subset has only one file to download. Contact us to receive your credential for using this option.



Downloads with Special Tools

The download process can be automated using scripts, automatic downloading or workflow automation tools. For example, this blog post shows how the download process for MolPort files is automated using KNIME.


API Access

Chemical Search API

We synchronize data with its suppliers as frequently as possible. 80% of stock compound inventory data are updated daily. These changes are reflected immediately on our website. However, downloadable files are updated only once a month. We created MolPort web services (API) to allow you to check up-to-date inventory programmatically. Visit this page to find out more. To access data via web services, we provide KNIME nodes, Pipeline Pilot protocols, Excel templates and examples based on the Java, JavaScript, C# and Python programming languages.



Full Molecule Load API

Data collected from multiple suppliers is not suitable to be stored in SDF or SMILES formats. Thus, you can use Full Molecule Load API to download complete data for a specific molecule in JSON format. The data will include all supplier details, that you see on the dedicated compound pages of our website. Please note, that prices you get via web services do not include any possible volume discounts, which apply when ordering larger number of compounds. Should you require a formal quotation, we suggest using the MolPort List Search, It calculates the best way to procure your compound set and you can generate a formal quote online as a spreadsheet or in PDF format too.



Data Subsets for Download:


Database

For your convenience we have divided the database download into the following:

All Stock Compounds
• FTP folder name: "All Stock Compounds"
• File names start with: "IIS"

Information on stock products, which can be delivered in 2 weeks or sooner. The files combine Screening Compounds and Building Blocks, providing and mark on its type.

All Stock Screening Compounds
• FTP folder name: "Screening Compounds"
• File names start with: "IISSC"

Subset of "All Stock Compounds". Screening compounds are usually available in milligram quantities. Over 99% of stock screening compounds have a guaranteed purity of over 90% by H-NMR or LCMS.

All Stock Building Blocks
• FTP folder name: "Building Blocks"
• File names start with: "IISBB"

Subset of "All Stock Compounds" containing data on all stock building blocks. Building blocks are usually available in larger amounts, than screening compounds. They can have higher purity or better characteristics and their lead-time is generally shorter. Building Blocks usually have one or more active functional groups, which allow using them to produce new compounds in a specific chemical reaction.

Made-to-Order Compounds
• FTP folder name: "Made To Order"
• File names start with: "V"

Made-to-order compounds are considered to be an addition to the Stock compounds. The files contain over 20 million products (both SC and BB) with predefined prices and suppliers. However, the prices for such compounds should be considered for reference purposes only. Suppliers tend to call such products “virtual”, “tangible”, “back ordered”, “to be synthesized” and so on. Lead time for made-to-order compounds varies from 4-6 weeks to 3 months. The estimated synthesis success rate is 50%-80%.

Full Database • FTP folder name: "Full DB"
• File names start with: "fulldb"

This set combines Stock and made-to-order compounds. The files also contain all historical information on products, which were sold out, have no price defined or do not have specific suppliers assigned at the moment. Altogether this set consists of over 40 million chemical structures. Data provided in SMILES format only.



Q/A: Why don’t you provide supplier information in the files?

Because of three reasons:
• Supplier specific data is not suitable to be stores in SDF or SMILES. We provide a link, which allows to check this data for a specific compound on our website or by using our web services (API).
• From early on MolPort was pushing suppliers to provide reliable data. As a result, we consider the information on stock products reliable the point, where you do not need to filter out suppliers or give your preference to any of them in advance.
• For sole product orders the total order price heavily depends on supplier shipping cost to your region. For multiple compounds, order volume discounts may reduce some items costs by more than half. Our List Search feature allows you to take all these variables into account and compile the most suitable selection for your order and even generate a formal quote online. Furthermore, we can work with your list to select a less expensive compound per group or cluster (with similar properties or activity) of compounds. Contact us to find out more.



Q/A: Can I get 3D structures in the Database?

We do not have this format available for structures. Generally, each company or research group have their own rules on 3D structure generation, including number of tautomers or conformers generated per structure, a specific method of calculation, a vendor specific format of structure output and so on.

However, we cooperate with software developers and 3D databases to have our data pre-calculated on their side in a format suitable for their products.

Our data is available for Schrödinger users and on ZINC docking.

Do you wish your software vendor to include MolPort data in their format? We are open for cooperation, please contact us and we will try to make it possible.



Q/A. Where else can I find MolPort data?

MolPort data available from the sources:
PubChem
ChemSpider
ZINC docking
Schrödinger
Binding DB (subset)

Our API is used for instant access by:
Optibrium StarDrop
ZINC docking


Would you like to use it in your platform? Do you have suggestions? Please contact us.

2018 MolPort, v2.56, release date 04-00-2018 10:00 (+0200). All rights reserved.
This website or its third-party tools use cookies, which are necessary to its functioning and require to achieve the purposes illustrated in the Cookie Policy. If you want to know more or withdraw your consent to all or some of the cookies, please refer to the Cookie Policy. By clicking on Agree and closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to the use of cookies.