CHEMDISGENET

The Database Integration Workflow

15+
Databases Integrated

From public repositories to specialized datasets.

3
Core Data Categories

Chemical-Target, Side Effects, and Indications.

8
Key Process Stages

A meticulous workflow from download to final metrics.

The Master Workflow

1

Download

Automated & manual data retrieval.

2

Map

Standardize chemicals & diseases.

3

Process & Filter

Apply source-specific curation rules.

4

Summarize

Generate final data tables.

Our Data Sources

Integrating diverse information for a holistic view.

FAERS

The FDA Adverse Event Reporting System (FAERS) is a database that contains information on adverse event and medication error reports submitted to FDA.

OFFSIDES & SIDER

OFFSIDES and SIDER are databases of drug side effects extracted from package inserts.

ECHA

The European Chemicals Agency (ECHA) database contains information on chemicals registered in the EU and their hazardous properties.

ChEMBL

ChEMBL is a manually curated database of bioactive molecules with drug-like properties.

Papyrus

A large-scale curated dataset for bio- and chem-informatics, containing information on bioactive molecules and their targets.

And more...

We also integrate data from other sources like ToxRef, IRIS, PPRTV, RAIS, EWAS ATLAS, and EWAS CATALOG to provide a comprehensive view.

Data Processing Deep Dive

Each data source has a unique set of rigorous filtering and preprocessing rules to ensure data quality and consistency.

ChEMBL

Chemical-Target Associations

  • Confidence score ≥ 7
  • pChemBL value ≥ 5
  • Human, Rat, or Mice models only
  • Exclude "Inconclusive" or "Not active"

FAERS

Side Effect & Indication Data

  • "Primary Suspect Drug" (PS) only
  • Reported by Physician, Pharmacist, or HP
  • Disease names mapped to CUIs
  • Requires secure tunnel for disease manager

IRIS, PPRTV, RAIS

Toxicity & Contaminant Data

  • Discard "Low Confidence" entries
  • InChIKeys recovered via DTXSID/CAS
  • Disease names harmonized via NLP
  • Separate processing for carcinogenic data

Final Output: Summary Tables

The workflow culminates in a set of clean, integrated summary tables, ready for analysis and downstream applications.

๐Ÿ“œ

chemical_summary

๐Ÿงช

chemical_reac_summary

๐Ÿ’Š

chemical_indi_summary

๐Ÿ”ฌ

chemical_target_summary

Key Metrics

A glimpse into the scale of our data.

Entities per Source

Chemical Identifier Distribution