C4 – Transcriptomics bioinformatics best practices in toxicogenomics for regulatory application

Principal Investigator

Dr Florian Caiment
Maastricht University (UM)
School for Oncology & Developmental Biology

Collaborators

Weida Tong, Division Directorn, U.S. Food and Drug Administration (FDA), Washington, D.C., Maryland, USA
Leming Shi, Professor and Director, Fudan University (FU), Shanghai, China
Tim Gant, Head of Toxicology, Public Health England (PHE), Professor of Biochemistry and Physiology; University of Surrey, Oxfordshire, UK
Bruce Seligman, CSO, Biospyder, Tucson, AZ, USA

Description

Most classes of biological molecules (genes, protein, metabolites ….) can now be subject to high throughput analysis and the technologies to do this are collectively known as “omics”. Use of these technologies in toxicology has given rise to toxicogenomics. Here they have been extensively used in studies investigating the different mode-of-actions (MoAs) of many different compounds, or to classify the putative toxicity of unknown substances.

However, despite the expanding number of research scientific publications using omics in the field of toxicology – with the exception of few cases in the domain of drug development – no omics data has been used till date to support a chemical regulatory application, for instance under REACH. Regulatory agencies mainly report two major issues concerning the use of omics technologies: 1/ The high technical variance for each given technological platform, which make the data sometimes difficult to correlate within and between different platforms; 2/ The impact that the choice of bioinformatics analysis pipeline has on the results, reflected in pipeline-dependent differences in the lists of biological systems significantly affected by the compounds of interest, making the “truth” of toxicity difficult to assess or believe from omics data.

While several scientific consortium had been carried out to tackle these two main issues, notably with respect to microarray quality control (MAQC-I and II1, 2) followed by sequencing quality control (SEQC3), both leading to major publications in high impact factor journals, no consensus on an omics analysis framework (ODAF) for regulatory application has been achieved yet . This is now an exercise being undertaken within the EAGMST (Adverse Outcome Pathways, Molecular Screening and Toxicogenomics) program of OECD and the purpose of the current proposal is to generate further data to support this work. The need for this work is exemplified by the work cited in the C4 RfP proposal and currently being published by ECETOC. To date, there are no OECD guidance documents available for the generation and analysis of omics data. Here, one of the major roadblocks is the lack of a standardized procedure for the analysis of the data. This results in different conclusions possibly being derived from one and the same set of data depending on the transformations and statistical procedures used. This creates an issue for regulators who are not able to assess whether the results generated from such data support the conclusions being drawn and do not have the means to verify the conclusions.

In this particular context, our project proposal aims to regroup toxicogenomics experts to test and further develop a regulatory ODAF (R-ODAF) proposal for the toxicogenomics community with the ambition to enable the regulatory bodies to consider omics as a relevant data type to support compound submissions.

We will focus our project on transcriptomics data for two reasons. First, transcriptomics offers the most comprehensive platform available while proteomics and metabolomics still struggle to identify and measure the expressed proteome or metabolome (especially for low expressed entities), sequencing-based transcriptomics methods being capable of measuring the expression level of each individual gene in the transcriptome (the whole transcriptome), including gene isoforms or post-transcription modifications (such as RNA editing…). Second, transcriptomics is by far the most abundant data type available in toxicogenomics, which makes it mandatory to establish an analysis framework for this in particular.

Focusing then on transcriptomics, our project objectives are to:

Identify and collect relevant toxicogenomics datasets on the three major transcriptomics platforms: microarrays, RNA-seq and the new TempO-Seq® technology 4 (from BioSpyder). Notably, the main objective here is to assemble data on the maximally available number of toxic compounds, encompassing different duration and doses of exposure. A particular focus will be on obtaining similar datasets across the three platforms. The collected datasets will, by necessity, guide the study to the most used technology within each platform (e.g. Affymetrix for arrays or Illumina/RNA-Seq for sequencing) and emerging platforms (e.g. TempO-Seq) for which smaller datasets are available, in order to provide a generalized R-ODAF.
Analyze individual platform data using a variety of methods, including state-of-the-art analysis methods (such as Limma5 for microarrays or EdgeR6 / DeSeq27 for sequencing) together with more minor or recently published pipelines. These analyses results will be compare with the Reference Baseline Analysis established during the ECETOC workshops (European Centre for Ecotoxicology and Toxicology Of Chemicals) by Tim Gant and Weida Tong
Establish a common foundation method which aims at defining how to recognize and discard bad quality samples (e.g. identify outliers) and how to define thresholds and parameters for identifying differential expression (e.g. pvalue, multiple testing correction method, fold change) for each platform.
Propose the R-ODAF to the community; pointing out specifically the best practice guidelines to apply for each of the three platforms, being transparent on all parameters of the R-ODAF that are generalizable between the platforms and about any platform-specific criteria that need to be incorporated within the R-ODAF and integrate this work with that of the OECD EAGMST group.

Read the Executive Summary here.

Related Publications

Presentation:

F. Caiment. Towards the development of an omics data analysis framework for regulatory application. MAQC Society 2nd Annual Meeting, February 2018, Shangai, China.

Timeline: April 2018 > April 2020

LRI funding: € 250 000