$ dbZACH FAQ

The dbZACH system is a toxicogenomics supportive bioinformatics infrastructure consisting of 1) an Oracle 9i relational database, 2) online interfacing syspems,"a~d 3) Java-based data mining and ajalysks0tools. Administrative oversight and upkeep of dbZACH iq pergozmet by |h% Zacharewski Lab Bioinformatics Group.

The goals of the dbZach project are to 1) provide a model toxicogenomics database, 2) provide facilities for the modeling of toxicogenomics data, 3) provide a centralized source of biological knowledge to streamline tata ei.ing and knowledge-based discovery Of chumcal mechanisms of action, 4) provide a learning environment for bioinformatic algorithmics and tools development.

dbZach has been, and remains supportive of the MAGE efforts. MAGE-ML is the data sharing standard used by dbZach. Tools for the export of MAGE-ML data from dbZach are currently in"devemoxment.

Contents:

 

dbZACH Design Philosophy

The dbZach design philoskphy jac several parts:

  • Modular structure
  • Fast and efficient query architecture
  • Investigator-friendly interfaces
  • The most complete and up-to-date gene annotation data available to investigators
  • Intelligent correlation and comparison tools for genomic studies
  • Support for the MIAME Standards
  • "One-stop-shopping" convenience for data analysis needs (Genomic data, pathway inference, and gene regulation data all in one location!)
  • Move towards Systems Toxicology

The dbZach architecture can be used in both small labs and large enterprise installations. The RDBMS used in each of these situations is the primary diference.

Back to Top

 

dbZACH Query Model

 

dbZACH consists of 10 subsystems:

 

  • Clones
  • Microarray
  • Gene Annotation
  • Sample Annotation
  • Toxicology
  • Protocols
  • Real-Time PCR
  • Promoter
  • Pathway
  • Protein

The following figure depicts how several of these subsystems interact.

 

 

Back to Top


Clones Subsystem

The Clones Subsystem contains information concerning the clones residing within the lab. This information includes:

  • GenBank Accesssion Number
  • IMAGE Clone ID
  • Physical location within the lab

The information in this subsystem is used by our automated update scripts to ensure the accuracy of our gene annotation data.

Schema:

Click here for the schema

Back to Top

 


Microarray Subsystem

The Microarray Subsystem contains information concerning the custom microarrays constructed and used within the lab. Information housed within this subsystem includes:

  • Names and print versions of microarrays used in experiments
  • Microarray images (raw data)
  • Feature coordinates -- physical location of cDNAs on the array
  • Results from analysis of scanned microarray
  • Mapping from the microarray experiment and sample to the microarray

dbZach stores the actual TIFF microarray images! Using facilities within Oracle 9i we are able to acquire all meta data associated with each TIFF, and are able to capture image tiles as small as 1x1 sq. pixel! Using the Java Microarray-Feature Inspection Tool (JM-FIT; part of the dbZach system's Java Microarray Analysis System -- currently in development) investigators are able to assess the quality of their microarray data on a per-feature, per-pixel basis!

The results of statistical analyses conducted on scanned arrays is related to the Clones and Genes Subsystems to facilitate identitification of gene functions, and assist investigators assess the biological significance of microarray results. Current and future bioinformatics projects involving this subsystem include:

  • Automated quality control analysis -- from spot inspection to whole array quality control
  • Automated identification of problem arrays (fits in with above project)
  • Creating statistical functionality within Oracle to automatically perform statistical analyses of microarray datasets

Schema:

Click here for schema.

Back to Top


Genes Subsystem

The Genes Subsystem catalogs all of the genes represented on our microarrays, and indexes various annotative information. Annotative data include the following:

  • Functional Analysis
  • Gene Ontology
  • Enzyme Commission
  • Gene Names (primary as well as secondary)
  • Gene Abbreviations (primary as well as secondary)

Schema:

Click here for schema.

 

Back to Top


Genes-Clones Subsystem Interface

This figure illustrates how the Genes and Clones subsystems interface with each other.

Schema:

Click here for schema.

 

Back to Top

Sample Annotation Subsystem

The Sample Annotation Subsystem is part of our implementation of the MIAME standards. This subsystem utilizes the controlled vocabulary from MIAME, as well as the object relations specified in the MAGE-OM. In our implementation, we have broken up the in vitro and in vivo specific aspects to reduce redundancy at the higher-order levels (e.g., biosource specification, species information, etc...). This also allows us to more specifically define parts of the MIAME standard to suit our purposes (e.g., media formulations, diet specifications, etc...). By breaking things out from the MIAME required data in this manner, we ensure our ability to maintain compliance within the MAGE-OM. Another way of thinking of this is that we are extending the MAGE-OM, much like programmers extending a class in C++ or Java.

To reduce redundancy within the database, the Sample Annotation Subsystem is also used with the Toxicology Subsystem. This also enhances our knowledge-base, as we can better model chemical effects within an organism or cell culture. Since we are using the same Biosource table for both genomics and toxicology tests, we can model data taken from the same animal much more efficiently. This is especially useful when identifying outliers or idiosyncratic responses. If an idiosyncratic or outlier response is identified by genomics in a given tissue, we can analyze other data from that animal to identify possible causes or mechanisms.

Schema:

Click here for schema.

 

Back to Top

Toxicology Subsystem

Part of the design philosophy of dbZach is the idea of 1 animal, multiple tests, lots of data, 1 database. That does not mean that large labs cannot use dbZach, or experimenters who use multitudes of animals can't use dbZach -- quite the contrary! What it does mean is that we're committed to creating a technology that allows investigators to perform as many assays on each biosource as possible, thereby increasing the power of your analyses. It is far better to perform genomic and proteomic assays within the same experimental unit when the goal is moving towards a Systems Toxicology approach. The dbZach System is one part of a multi-prong strategy to assist labs in moving towards this goal.

The Toxicology Subystem indexes end-point toxicity measures to facilitate correlation with gene expression analyses. Keeping the results of a battery of toxicology tests in the database facilitates comparisons of chemical mechanisms of action, and supports functional toxicogenomic and chemoinformatic investigations of structure analysis relationships (SARs). By relating the Sample Annotation Subsystem with the Toxicology Subsystem we are able to capture several different kinds of data from one biosource (i.e., cells, tissues, animal) regardless of whether it is from an in vivo or in vitro system.

Pathology Data: The Toxicology Subsystem makes use of the National Toxicology Program (NTP) Pathology Code Tables. In brief, the Toxicology Subsystem has capabilities to manage data from tissue sections sent for histopathological analysis. The pathologist will score the tissue, and make comments using the controlled vocabulary from the NTP Pathology Code Tables. dbZach will store this data along with the actual images in the database! From a computational perspective this will facilitate the creation of new histopath image-based data mining tools.

From a practical standpoint, the development of the pathology subpart facilitates the kind of pathology-to-omic data mining that will be crucial for proper knowledge and information-based mechanistic studies. It is envisioned that analysts will be able to execute one or a few SQL queries and mine information from the database much faster than pouring through pages of pathology reports. These efforts will also lead to the development of new data mining and visualization tools to facilitate analysis.

Schema for Pathology Subpart: click here

Clinical Chemistry: The Toxicology Subsystem makes use of the Biosource from the Sample Annotation Subsystem and can store clinical chemistry parameters. Our current architecture allows for any number of clinical chemistry parameters to be analyzed.

Schema:

Click here for schema.

 

Back to Top


Interfacing with dbZACH

Below is an image that illustrates how dbZACH is queried by both internal and external users.

As is illustrated in the figure above, dbZACH is queried primarily through ASP/Javascript interfaces that are processed by an application server. Output to the client browser is sent via the world wide web (WWW) in the form of HTML. One of the items the Bioinformatics Group is currently examining is the migration from HTML to XML.

Back to Top


Document Information

This page last modified: Thursday Mar 6 15:35:21 EDT 2002
Questions/Comments? Please contact the Webmaster