Cefic-lri Programme | European Chemical Industry Council

C4 – Transcriptomics bioinformatics best practices in toxicogenomics for regulatory application

Principal Investigator

Dr Florian Caiment

Maastricht University (UM)
School for Oncology & Developmental Biology
Universiteitssingel 40
6229 ER Maastricht
The Netherlands
Tel: +31-43-3882127
E-mail: Florian.caiment@maastrichtuniversity.nl


Weida Tong
Division Director
U.S. Food and Drug Administration (FDA)
Washington, D.C., Maryland, USA

Leming Shi
Professor and Director
Fudan University (FU)
2005 Songhu Road
Yangpu District, Shanghai 200438, China

Tim Gant
Head of Toxicology – Public Health England (PHE)
Professor of Biochemistry and Physiology; University of Surrey.
Public Health England (Government) and University of Surrey (Academic)
Centre for Radiation; Chemical and Environmental Hazards, Harwell Science and Innovation Campus, Oxfordshire, UK.
01235 825139

Bruce Seligman
1700 E. 18th St., Suite 100, Tucson, AZ 85719



Most classes of biological molecules (genes, protein, metabolites ….) can now be subject to high throughput analysis and the technologies to do this are collectively known as “omics”. Use of these technologies in toxicology has given rise to toxicogenomics. Here they have been extensively used in studies investigating the different mode-of-actions (MoAs) of many different compounds, or to classify the putative toxicity of unknown substances.

However, despite the expanding number of research scientific publications using omics in the field of toxicology – with the exception of few cases in the domain of drug development – no omics data has been used till date to support a chemical regulatory application, for instance under REACH. Regulatory agencies mainly report two major issues concerning the use of omics technologies: 1/ The high technical variance for each given technological platform, which make the data sometimes difficult to correlate within and between different platforms; 2/ The impact that the choice of bioinformatics analysis pipeline has on the results, reflected in pipeline-dependent differences in the lists of biological systems significantly affected by the compounds of interest, making the “truth” of toxicity difficult to assess or believe from omics data.

While several scientific consortium had been carried out to tackle these two main issues, notably with respect to microarray quality control (MAQC-I and II1, 2) followed by sequencing quality control (SEQC3), both leading to major publications in high impact factor journals, no consensus on an omics analysis framework (ODAF) for regulatory application has been achieved yet . This is now an exercise being undertaken within the EAGMST (Adverse Outcome Pathways, Molecular Screening and Toxicogenomics) program of OECD and the purpose of the current proposal is to generate further data to support this work. The need for this work is exemplified by the work cited in the C4 RfP proposal and currently being published by ECETOC. To date, there are no OECD guidance documents available for the generation and analysis of omics data. Here, one of the major roadblocks is the lack of a standardized procedure for the analysis of the data. This results in different conclusions possibly being derived from one and the same set of data depending on the transformations and statistical procedures used. This creates an issue for regulators who are not able to assess whether the results generated from such data support the conclusions being drawn and do not have the means to verify the conclusions.

In this particular context, our project proposal aims to regroup toxicogenomics experts to test and further develop a regulatory ODAF (R-ODAF) proposal for the toxicogenomics community with the ambition to enable the regulatory bodies to consider omics as a relevant data type to support compound submissions.

We will focus our project on transcriptomics data for two reasons. First, transcriptomics offers the most comprehensive platform available while proteomics and metabolomics still struggle to identify and measure the expressed proteome or metabolome (especially for low expressed entities), sequencing-based transcriptomics methods being capable of measuring the expression level of each individual gene in the transcriptome (the whole transcriptome), including gene isoforms or post-transcription modifications (such as RNA editing…). Second, transcriptomics is by far the most abundant data type available in toxicogenomics, which makes it mandatory to establish an analysis framework for this in particular.

Focusing then on transcriptomics, our project objectives are to:

  • Identify and collect relevant toxicogenomics datasets on the three major transcriptomics platforms: microarrays, RNA-seq and the new TempO-Seq® technology 4 (from BioSpyder). Notably, the main objective here is to assemble data on the maximally available number of toxic compounds, encompassing different duration and doses of exposure. A particular focus will be on obtaining similar datasets across the three platforms. The collected datasets will, by necessity, guide the study to the most used technology within each platform (e.g. Affymetrix for arrays or Illumina/RNA-Seq for sequencing) and emerging platforms (e.g. TempO-Seq) for which smaller datasets are available, in order to provide a generalized R-ODAF.
  • Analyze individual platform data using a variety of methods, including state-of-the-art analysis methods (such as Limma5 for microarrays or EdgeR6 / DeSeq27 for sequencing) together with more minor or recently published pipelines. These analyses results will be compare with the Reference Baseline Analysis established during the ECETOC workshops (European Centre for Ecotoxicology and Toxicology Of Chemicals, http://www.ecetoc.org/ ) by Tim Gant and Weida Tong, both collaborators of the present LRI proposal.
  • Establish a common foundation method which aims at defining how to recognize and discard bad quality samples (e.g. identify outliers) and how to define thresholds and parameters for identifying differential expression (e.g. pvalue, multiple testing correction method, fold change) for each platform.
  • Propose the R-ODAF to the community; pointing out specifically the best practice guidelines to apply for each of the three platforms, being transparent on all parameters of the R-ODAF that are generalizable between the platforms and about any platform-specific criteria that need to be incorporated within the R-ODAF and integrate this work with that of the OECD EAGMST group.

Related Publications

Florian Caiment / UM

Caiment F, Gaj S, Claessen S, Kleinjans J. High-throughput data integration of RNA-miRNA-circRNA reveals novel insights into mechanisms of benzo[a]pyrene-induced carcinogenicity. Nucleic Acids Res.15

Hendrickx DM, Aerts HJ, Caiment F, Clark D, Ebbels TM, Evelo CT, Gmuender H, Hebels DG, Herwig R, …, Sarkans U, Segura-Lepe MP, Sotiriadou I, Wittenberger T, Wittwehr C, Zanzi A, Kleinjans JC. 2015 diXa: a data infrastructure for chemical safety assessment. Bioinformatics.17

Wang C, Gong B, Bushel PR, Thierry-Mieg J, Thierry-Mieg D…, Caiment F, … Shi L., Paules RS, Auerbach SS, Tong W. 2014. The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance. Nature Biotechnology18

Su Z, Labaj PP, Li S, Thierry-Mieg J, Thierry-Mieg D, Shi W…, Caiment F, … Zheng Y, Zhou Y, Zumbo P, Tong W, Kreil DP, Mason CE, Shi L.. 2014. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nature Biotechnology 3

Caiment F, Tsamou M, Jennen D, Kleinjans J. 2014. Assessing compound carcinogenicity in vitro using connectivity mapping. Carcinogenesis19

Caiment F, Charlier C, Hadfield T, Cockett N, Georges M, Baurain D. 2010. Assessing the effect of the CLPG mutation on the microRNA catalog of skeletal muscle using high-throughput sequencing. Genome Research. 20

Leming Shi / FU:

Zheng Y, Qing T, Song Y, Zhu J, Yu Y, Shi W, Pusztai L, Shi L. 2015. Standardization efforts enabling next-generation sequencing and microarray based biomarkers for precision medicine. Biomark Med.21

Zhang W, Yu Y, Hertwig F, Thierry-Mieg J, Zhang W, …, Berthold F, Wang J, Tong W, Shi L, Peng Z, Fischer M. 2015. Comparison of RNA-seq and microarray-based models for clinical endpoint prediction. Genome Biol 22

Xu J, Su Z, Hong H, Thierry-Mieg J, Thierry-Mieg D, Kreil DP, Mason CE, Tong W, Shi L. 2014. Cross-platform ultradeep transcriptomic profiling of human reference RNA samples by RNA-Seq. Sci Data 23

Shi L, Campbell G, Jones WD, Campagne F, … Puri RK, Scherf U, Tong W, Wolfinger RD; MAQC Consortium. 2010. The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nat Biotechnol 1

Shi L, Reid LH, … Zong Y, Slikker W Jr.; MAQC Consortium. 206. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 2

Weida Tong / FDA:

Fang H, Harris SC, Liu Z, Zhou G, Zhang G, Xu J, Rosario L, Howard PC, Tong W. 2016. FDA drug labeling: rich resources to facilitate precision medicine, drug safety, and regulatory science. Drug Discov Today. 24

Healy MJ, Tong W, Ostroff S, Eichler HG, Patak A, Neuspiel M, Deluyker H, Slikker W Jr. 2016. Regulatory bioinformatics for food and drug safety. Regul Toxicol Pharmacol. 25

Xu J, Gong B, Wu L, Thakkar S, Hong H, Tong W. 2016. Comprehensive Assessments of RNA-seq by the SEQC Consortium: FDA-Led Efforts Advance Precision Medicine. Pharmaceutics. 26

Tong W, Ostroff S, Blais B, Silva P, Dubuc M, Healy M, Slikker W. 2015. Genomics in the land of regulatory science: FDA-Led Efforts Advance Precision Medicine. Pharmaceutics. 27

Bruce Seligmann/TempO-Seq:

Yeakley JM, Shepard PJ, Goyena DE, VanSteenhouse HC, McComb JD, Seligmann BE. 2017 A trichostatin A expression signature identified by TempO-Seq targeted whole transcriptome profiling. PLOS one 4

Grimm FA, Iwata Y, Sirenko O, Chappell GA, Wright FA, Reif DM, Braisted J, Gerhold DL, Yeakley JM, Shepard P, Seligmann B, Roy T, Boogaard PJ, Ketelslegers HB, Rohde AM, Rusyn I. 2016 Chemical-biological similarity-based grouping of complex substances as a prototype approach for evaluating chemical alternatives. Green Chem. 28

Tim Gant / PHE:

Aigner A, Buesen R, Gant T, Gooderham N, Greim H, Hackermüller J, Hubesch B, Laffont M, Marczylo E, Meister G, Petrick JS, Rasoulpour RJ, Sauer UG, Schmidt K, Seitz H, Slack F, Sukata T, van der Vies SM, Verhaert J, Witwer KW, Poole A. 2017 Advancing the use of noncoding RNA in regulatory toxicology: Report of an ECETOC workshop. Regul Toxicol Pharmacol 29

Tonge DP, Gant T. 2016 What is normal? Next generation sequencing-driven analysis of the human circulating miRNAOme. BMC Mol Biol30

Chapin RE, Boekelheide K, Cortvrindt R, van Duursen MB, Gant T, Jegou B, Marczylo E, van Pelt AM, Post JN, Roelofs MJ, Schlatt S, Teerds KJ, Toppari J, Piersma AH. 2013 Assuring safety without animal testing: the case for the human testis in vitro. Reprod Toxicol 31

Zhang S., Gant T. 2004 A statistical framework for the design of microarray experiments and effective detection of differential gene expression. Bioinformatics

Timeline: July 2018 > July 2020

LRI funding: €250K

Cefic-Lri Programme Responsible Care

© Copyright 2017 Cefic | European Chemical Industry Council. All rights reserved | Terms and Conditions of Use