Published ahead of print on October 5, 2007, doi:10.1165/rcmb.2007-0151OC
© 2008 American Thoracic Society DOI: 10.1165/rcmb.2007-0151OC A Functional and Regulatory Map of Asthma1 School of Computer Science and Engineering, 2 Department of Molecular Genetics and Biotechnology, Faculty of Medicine, The Hebrew University, Jerusalem; 3 Department of Computer Science and Applied Mathematics, The Weizmann Institute of Science, Rehovot, Israel; and 4 Dorothy P. and Richard P. Simmons Center for Interstitial Lung Diseases, Division of Pulmonary and Critical Care Medicine, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania Correspondence and requests for reprints should be addressed to Naftali Kaminski, M.D., University of Pittsburgh Medical Center, NW 628 MUH, 3459 5th Avenue, Pittsburgh, PA 15261. E-mail: kaminskin{at}upmc.edu
The prevalence and morbidity of asthma, a chronic inflammatory airway disease, is increasing. Animal models provide a meaningful but limited view of the mechanisms of asthma in humans. A systems-level view of asthma that integrates multiple levels of molecular and functional information is needed. For this, we compiled a gene expression compendium from five publicly available mouse microarray datasets and a gene knowledge base of 4,305 gene annotation sets. Using this collection we generated a high-level map of the functional themes that characterize animal models of asthma, dominated by innate and adaptive immune response. We used Module Networks analysis to identify co-regulated gene modules. The resulting modules reflect four distinct responses to treatment, including early response, general induction, repression, and IL-13–dependent response. One module with a persistent induction in response to treatment is mainly composed of genes with suggested roles in asthma, suggesting a similar role for other module members. Analysis of IL-13–dependent response using protein interaction networks highlights a role for TGF-β1 as a key regulator of asthma. Our analysis demonstrates the discovery potential of systems-level approaches and provides a framework for applying such approaches to asthma.
Key Words: house dust mite IL-13 ovalbumin systems biology TGF-β
Asthma is a chronic lung disease characterized by airway inflammation, hyperresponsiveness, remodeling, and obstruction (1). The lung phenotype in asthma is believed to be determined by the interaction of the environment with the patient's genetic background (2). This interaction leads to a dramatic change in the airway microenvironment that includes activation of inflammatory pathways, recruitment of immune cells that are not usually present in the airway, and a dramatic change in the phenotype of airway resident cells. While individual changes in many of these factors may generate components of the asthmatic phenotype, it is the converging effects of these pathways on recruited and altered cells that determine the patient's disease. The advent of high-throughput technologies for gene and protein profiling has greatly improved our ability to characterize the behavior of genes and proteins in health and disease. Using animal models of allergic airway disease, investigators applied DNA microarrays to identify potential regulators of asthmatic airway inflammation such as C5 (3), ARG1 (4), ADAM8 (5), SPRR2 (6) as well as to explore the pathways activated by IL13 and STAT6 during the development of allergen-induced lung inflammation (7–9). Transcriptional analysis of the response to IL-13, allergen challenge, syncytial virus infection, and corticosteroids in epithelial cells identified multiple and rarely overlapping genes (10–14). While many of these studies used elegant experimental approaches to dissect pathways and to identify and validate potential novel key regulatory molecules, the majority of their insights were obtained using statistical methods that select individual genes based on their relevance to a specific experimental setup, such as a certain knockout or allergen. These approaches, although successful, tend to reduce the complexity of the data and do not provide a global, systems-level view of the process studied. Furthermore, the dependence on gene-level analysis magnifies the impact of the noise generated by the experimental models and the different technical aspects of microarray sample preparation, hybridization, and scanning. Lastly, such methods are not capable of integrating multiple levels of information. Recently, there has been an increased interest in methods that allow a systems view of the studied process, a view that observes not only the components of a system but also their emergent properties. This includes methods that uncover the functional themes that characterize gene expression profiles (15, 16) as well as methods that use advanced computational algorithms for the integration of multiple levels of information (17–19) and identification of regulatory modules in complex tissue and in disease (20). In this study we create a global map of asthma using publicly available gene expression datasets from multiple sources and tools that allow integration of multiple levels of information, such as functional annotations and protein interactions (Figure 1). In addition to providing a comprehensive description of this map, we present examples of novel observations. These observations include individual gene expression heterogeneity of genetically identical animals, a transcriptionally distinct module of known and potentially novel asthma genes, and support for a central role of TGF-β in IL-13–dependent allergic lung inflammation. The map, as well as all datasets, analyses, and additional examples, are available on the interactive AsthmaMap website (http://compbio.cs.huji.ac.il/AsthmaMap).
Datasets We searched NCBI Gene Expression Omnibus (GEO) for all in vivo asthma murine models gene expression datasets, publicly available by June 2006. Five datasets that passed our inclusion criteria (see online supplement) were combined to generate the expression compendium, and an additional dataset published in 2007 was used for independent validation (Table 1). IL-13 (GEO series GSE1301, Wills-Karp and coworkers): 12 lung samples from IL-13 knockout (IL-13–KO) and BALB/cJ wild-type (WT) mice that were treated with house dust mite (HDM) or with PBS as control. RAG (GEO series GSE483, Wills-Karp and colleagues): 7 lung samples from BALB/cJ mice that were treated with ragweed pollen protein plus Alum or with PBS as control. MAH (Murine Airway Hyperresponsiveness, GEO series GSE3184, Wills-Karp and associates): 20 lung samples taken from asthma-sensitive A/J mice and 2 lung samples taken from resistant C3H/MeJ mice. Lungs were harvested at 6 and 24 hours after challenging the mice with HDM/ovalbumin or with PBS as control. DEA (Dissection of Experimental Asthma [4]): 11 lung samples from BALB/cJ mice that were treated with ovalbumin or aspergillus fumigatus (ASP) allergens, or with saline as control. FTM (Focused Transgenic Modeling [8]): 50 lung and tracheal perfusate (TP) samples from four mice strains, which were treated with ovalbumin or with PBS. The four strains are: (1) IL-13+, Stat6+/–; (2) IL-13+, Stat6–/–; (3) IL-13–Epi (IL-13 overexpresses, STAT6 expressed only in epithelial cells); and (4) BALB/cJ WT. The datasets have been previously described (3–5, 8) and recently reviewed by Rolph and coworkers (21). TEST (GEO series GSE6858 [22]): 16 lung samples taken from BALB/cJ recombinase-activating gene–deficient (RD) mice and from WT mice. Lungs were collected 1 day after challenging with ovalbumin or PBS as control. This dataset was used to as an independent validation set. See Table 1 for description of all datasets.
Compendium Generation Transcript levels of four datasets (IL13, RAG, MAH, DEA), generated with Affymetrix GeneChip arrays, were determined from their data image files using RMAExpress (23, 24). Transcript levels of FTM were taken from the original article (8), as this is a two-channel array that cannot be processed using RMAExpress. Each dataset was normalized such that the mean expression of every gene is zero. Two compendiums were created. The first (Table E1 in the online supplement), which was used for the supervised analysis, includes the five datasets and consists of 102 samples and 7,238 genes that have at most two absent calls across all samples (6,963 of the 7,238 genes had no absent calls). For the Module Network analysis, a second compendium was generated from the first four datasets only (Table E2) by choosing all genes that had at most seven absent calls across all samples This unified compendium consists of 8,086 genes and 52 samples (8,084 of the 8,086 genes had no absent calls). FTM dataset was not included in this analysis because its inclusion introduced a bias (see methods portion of the online supplement).
Gene Set Knowledgebase Generation
Data Visualization and Enrichment Analyses Visualization, heatmap generation, and gene set enrichments were performed using Genomica software (http://genomica.weizmann.ac.il/). Differentially expressed gene signatures were calculated with parametric Student's t test that rejects the null hypothesis if the means are not equal. The t test score was controlled for False Discovery Rate (FDR) (31), using ScoreGenes software (http://compbio.cs.huji.ac.il/scoregenes/). Protein interaction regulatory networks were generated using Ingenuity Pathways Analysis (Ingenuity Systems, Redwood City, CA).
Expression Profile of Gene Sets in Individual Animals
Gene Set Enrichment in Experimental Signatures Signatures of differentially expressed genes were identified for each dataset independently. The threshold for including a differentially expressed gene in a signature is relatively liberal (t test P value < 0.05). This choice facilitates identification of significant enrichments even when the gene level changes are relatively mild. The enrichment of gene sets in each signature was calculated with the hypergeometric model (FDR < 5%), relative to the enrichment in the complete dataset. We found 281 gene sets enriched in the signatures, of which 161 are presented in Figure 3 after manual curation that merged redundant gene sets.
Module Networks Analysis The modules and their regulation programs were automatically detected using Module Networks procedure (32). This method, based on probabilistic graphical models, detects modules of co-expressed genes and their shared regulation programs. The regulation program is a small set of genes, which determines the expression level of the module genes using a decision tree structure (regression tree). Given the expression values and a pool of potential regulator genes, a set of modules and their associated regulation programs are automatically inferred by an iterative procedure. This procedure searches for the best gene partition into modules and for the regulation program of each module, while optimizing a target function. The target function is the Bayesian score, derived from the posterior probability of the model (see Ref. 33 for a detailed description of the algorithm). We employed the Module Networks on the second compendium, which consists of 52 samples and 8,086 genes. From them, a pool of 1,764 potential regulators was created by choosing all the genes that carry a regulatory role, according to Gene Ontology annotations. The number of modules was determined as the number that achieved the best Bayesian score during the learning (Figure E1). Of the set of potential regulators, 217 regulators were found to regulate at least one module in the inferred network.
Generation of Module Cluster Map
Module Network Validation We used TEST dataset to independently validate our analysis. This recently published dataset was generated on Affymetrix GeneChip Mouse Genome 430 2.0 Arrays. Cell files were downloaded and normalized using RMAExpress. To measure how well the modules predict the expression of their genes in the new dataset, we measured the correlation of the modules between the new dataset and the old compendium. Pearson correlation was measured for 1,025 genes that participate in the four clusters. Their average expression in the WT treated mice in the TEST data was compared with the average expression in three sample groups from the compendium: the average of OVA-DEA samples, the average of OVA 24-hour BALB-MAH samples, and the average of all treated WT samples.
Ingenuity Protein Network Analysis
AsthmaMap Website The complete map, containing all the pre-processed data (expression compendium, 4,305 gene sets, and new gene lists reported here), is available on the interactive AsthmaMap website (http://compbio.cs.huji.ac.il/AsthmaMap). The website allows visualizing the sets along with the expression patterns of their genes. The sets can be downloaded in a format applicable for Genomica software. In addition, user-defined gene lists can be uploaded and analyzed for enrichment in respect to any of the sets available in this study.
Expression Compendium and Gene Knowledgebase We combined five different studies of pulmonary gene expression from mouse models of asthma. After normalization and filtering (see MATERIALS AND METHODS), we had a unified compendium describing the expression of 7,238 genes in 102 samples (Table E1). The sample set includes treatments of various mouse strains and cell types with different inducers of asthma (Table 1). We analyzed the compendium with a gene knowledgebase that includes 4,305 gene sets from three types of information sources (Table 2): functional annotations derived from different annotation databases such as Gene Ontology, pathway analyses, and disease association; sequence annotations (genes sharing the same predicted cis-regulatory motif in their promoters or genes encoding the same protein domain); and experimental annotations (genes that were differentially expressed between two conditions in other DNA microarray studies). We then identified modules of co-regulated genes using Module Networks analysis (32), and validated them with a new independent validation gene expression dataset. Finally, we proposed potential regulatory networks within these modules, by projecting the discovered modules to protein interaction networks. The complete map, containing all the pre-processed data (expression compendium, 4,305 gene sets, and new gene lists reported here), is available on the interactive AsthmaMap website (http://compbio.cs.huji.ac.il/AsthmaMap).
Gene Set Expression Profiles Reveal Diverse Response in Individual Animals We can see from the expression profiles that most treatments cause an impressive homogenous increase in multiple gene sets, including general immune response, cytokines, chemotaxis, and G protein–coupled receptor signaling (Figures 2A and 2B). This increase can be seen across all animals with an intact IL13-STAT6 pathway. On the other hand, increase in regulation of T cell and B cell activation, or antigen processing and presentation gene sets, seemed to vary among individual animals (Figure 2C). Interestingly, in some treated animals none of the gene sets are increased. Assuming that the experimental annotations are correct and that these are genetically similar to their experimental peers, we can only speculate that this lack of response may suggest a technical cause or an overlooked biological cause, of which experimentalists should be aware.
Emergent Gene Set Enrichment in Experimental Signatures The most immediate conclusion from this view is that the functional profile of genes induced by treatments is drastically different from those repressed. As expected, Gene Ontology gene sets related to antigen processing, hemopoiesis, cytokines, and response to stimulus are induced by all treatments (Figure 3A). Similarly, pathway gene sets like inflammation, cytokines, chemokines, and receptors dominate induced genes across all treatments (Figure 3B). Among the genes decreased, growth factors and development-related genes are dominant (Figures 3C and 3D). Reassuringly, IL-13–regulated gene sets identified in human hyperresponsive airway cells (HAH) stimulated by IL-13 were enriched in most models, but not in IL-13–KO mice (Figure 3E). Protease inhibitor activity and eicosanoid metabolism (arachdionic acid pathway) characterize HDM-induced signature in WT but not IL-13–KO mice (Figure 3F). Recently, there has been an increased interest in the role of TGF-β, a master regulator of fibrosis that suppresses inflammatory response, in asthma (34, 35). Although several TGF-β pathway genes were repressed after treatment and TGF-β1 itself was unchanged, we found that experimental gene sets of genes induced by TGF-β in lung fibroblasts, were enriched in most treated animals with an intact IL-13 pathway (Figure 3G), supporting the notion that indeed TGF-β may regulate gene expression patterns associated with acute inflammatory response. To assess whether the genes induced in every model were enriched with genes known to be associated with human disease we generated gene sets based on "Genetic Association" database (http://geneticassociationdb.nih.gov/). Indeed, sets of genes known to be associated with rheumatoid arthritis, inflammation, and asthma were all enriched in signatures increased by treatment (Figure 3H; see AsthmaMap website for detailed lists).
Unsupervised Analysis Detects Four Distinct Responses In this analysis we used only four datasets, as the fifth was hybridized on a different platform, and its inclusion introduced a strong bias (see online supplement). This second compendium describes the expression levels of 8,086 genes in 52 samples (Table E2). Among these genes, we defined 1,764 genes that carry a regulatory role according to Gene Ontology as potential regulators (Table E3). We achieved the best Bayesian score when learning with 61 modules (Figure E1). Examining the average expression of these modules (Figure 4A), we obtained four clusters of modules that could be characterized by their overall response. Global induction following a treatment (Figure 4A, Cluster I) is characterized by a strong induction by any type of treatment. It is enriched for general components of the immune response: innate and adaptive responses, complement system; chemokines and cytokines, and their receptors (Figure 4B and Table 3). In addition, the cluster of modules is enriched for genes with potentially direct correlation with asthma, such as IL-17 signaling pathway and eicosanoid metabolism.
The other three responses are as follows. Acute response to ovalbumin (Figure 4A, Cluster II) enriched for chemokines, cytokines and their receptors, inflammatory and DNA damage signaling pathways (nitric oxide, JAK-STAT, NF- B, Ca-NFAT, ATM) (Figure 4B and Table 3); repression following a treatment (Figure 4A, Cluster III) enriched for angiogenesis, cell development, and regulation of metabolism (Figure 4B and Table 3); and IL-13–dependent response, characterized by induction only in WT and not in IL-13–KO mice (Figure 4A, Cluster IV). A more detailed analysis of this cluster is described below.
Module Network Validation
Potential Genes with Novel Role in Asthma Induction The genes in the module include several chemokines such as CCL11 (Eotaxin, a known asthma regulator [40]); immunoglobulin-related molecules (IGHA1, IGJ); molecules suggested to be involved in allergic lung inflammation based on array data (ITLNA, CALCA3 [8], ARG1 [4], SPRR2A [6]); and chitinase family members CHIA and CHI3L3. Recently Homer and colleagues (41) demonstrated that both CHIA and CHI3L3 were induced in models of allergic lung inflammation. However, they differed in their distribution—CHIA was expressed in distal airway epithelial cells in which mucus was not expressed, while CHI3L3 expression was limited to central or proximal mid-airways but not distal. While they observed that both were induced by ovalbumin or IL-13 induction, in this module CHIA induction is dependent on an intact IL-13 pathway and CHI3L3 is not. The enrichment of this module with asthma-proven relevant genes should encourage further study of the role of other molecules in this module, such as SERPINA2G (a proteinase inhibitor that regulates cathepsin B activity) and SAA3 (a member of the serum amyloid protein family).
Potential Role of TGF-β1 in IL-13–Induced Allergic Lung Inflammation Additional examples of ingenuity networks found in clusters I, II, and III are available on the AsthmaMap website.
The aim of this article is to provide a framework for systems-level analysis of disease-relevant data from multiple sources. Despite inherent difficulties (differences in model parameters, nonideal experimental design, and limited number of animals in experimental subsets), we observe highly meaningful and reproducible patterns and themes that characterize allergic lung inflammation, and are robust to a specific model setup. One principle applied in this study is gene sets analysis rather than individual gene analysis; here the main benefit is the robustness to noise. Such robustness was critical when uncovering heterogeneity between individual animals, which is not dependent on changes in the levels of single genes (Figure 2). One concern in applying gene expression studies in human clinical research is diversity in genetic background that dictates individual variability. By being less affected by changes in the levels of single genes, gene set profiles allow us to observe changes in global trends and themes in response to a stimulus or an intervention. Such analyses may be highly useful in applying gene expression approaches to guiding human clinical research and potentially, in the future, disease diagnosis and management. When the gene sets analysis principle is applied to gene expression signatures, a global view of the nature of the genes that are changed during allergic lung inflammation in the mouse lung emerges (Figure 3). This view provides system-level support to previous hypotheses as well as generation of new ones. As an example in this analysis, we found that protease inhibitor activity and eicosanoid metabolism characterize HDM-induced signature in WT but not IL-13–KO mice (Figure 3F). Considering the difference in response to allergen in IL-13–KO and WT mice, the difference in eicosanoid metabolism is predictable: this pathway is often implicated in asthma in humans. More specifically, recently Shim and coworkers (42) demonstrated that ALOX5, a key enzyme in arachidonic acid metabolism, mediates IL-13–induced pulmonary inflammation; Trudeau and colleagues (43) also demonstrated that PTGS2, another key enzyme in this pathway, is regulated by IL-13 in airway epithelial cells. Our findings, which are based on microarray data created years before these articles and on analysis performed independently, provide a systems-level support for the observations by Shim and coworkers and Trudeau and colleagues. In parallel, their results validate and support our analyses. The differences in protease inhibitors activity, however, were not reported so far. These differences suggest that IL-13 effects on airway inflammation and remodeling may also be mediated by modulation of anti-protease activity, a finding that may have important therapeutic implications. An additional principle used in this study is an unbiased integration of multiple levels of biological information. This principle aims to enhance what the biologists often do, which is to prioritize hypotheses and insights based on their knowledge by using tools that integrate such knowledge. A good example of this approach is the potential regulatory role for TGF-β1 in allergic lung inflammation that we propose. This observation became obvious when we combined gene expression analysis, gene set analysis, and protein–protein interactions information. In fact, the role of TGF-β1 in allergic lung inflammation is not completely understood. Increased levels of TGF-β1 and evidence for TGF-β1 activation have been found in airways, bronchoalveolar lavage, and cells from of patients with asthma (44–46). Polymorphisms in the TGF-β1 promoter have been identified in patients with asthma (47–49) as well as increased levels of TGF-β2 in patients with severe asthma (50). Recently, Leung and coworkers demonstrated that inhibition of TGF-β1 receptor kinase reversed bronchial hyperreactivity in a murine model of allergic lung inflammation (51). Similar results were obtained by Hirano and colleagues with pirfenidone, an antifibrotic agent (52), and by Nakao and coworkers using transgenic mice that overexpress SMAD7, an inhibitor of TGF-β1 signaling (53). Lee and colleagues (54) demonstrated that IL-13–mediated pulmonary fibrosis was mediated through TGF-β1, as did Fichtner-Feigl and coworkers (55). Zhou and colleagues demonstrated synergism between IL-13 and TGF-β1 in TIMP1 induction (56). In our analysis, lungs of mice after antigen challenge are almost universally enriched with genes induced by TGF-β in airway resident cells (Figure 3G). In addition, protein network analysis demonstrates that IL-13–dependent gene cluster of modules is significantly regulated by TGF-β1 (Figure 5B). Our results indicate that TGF-β1 is induced early in allergic asthmatic response and may play a significant role in all of its stages and not necessarily only in the remodeling phase. Together with the murine TGF-β1 inhibition experiments, these findings suggest that modulating TGF-β1 signaling in the airway may be a potential target for therapeutic intervention in asthma. One of the interesting questions arising from gene expression data is whether there are a few key molecules that regulate the expression of the rest of the genes. The Module Networks algorithm attempts to address this question by detecting modules of genes that have a similar transcription under some context (context-specific clustering), and a set of regulators and rules that together predict the transcription levels of the target genes under the different contexts. We presented in details an example of a module (494, Figure 5A) which shows a persistent activation after all types of treatments and consists of many known asthma-related genes, both as module members and as regulators. However, this module also illustrates the limitations of the Module Network approach in gene expression data. It is tempting to hypothesize that the regulators CCL2, IL1RN, and ADAM8 indeed regulate the behavior of the genes in the module. But we cannot rule out that what we obtain is a conditional co-expression that may be driven by regulators outside the data set. Such regulation may occur at the protein level, or by miRNA, or even by simpler mechanisms such as changes in cellular admixtures. Nevertheless, in many cases the transcriptional level reflects a true regulation relationship (32), and the chosen regulators are valid. To address regulatory events that are beyond transcriptional regulation, we need to add more complete biological information, such as physical interactions that support the regulation relationship. The analysis of cluster IV, in which we found that TGF-β may be a regulator of IL-13–dependent genes although its transcriptional levels are not informative, illustrates this point. In conclusion, although many of the observations that we present were found in single datasets or traditional experiments, our global analysis supports the generalizability and reproducibility of these results beyond the specific experimental settings in which they were found. More importantly, by integrating multiple levels of information and complementary analytic approaches, we infer effects of novel regulators that are not necessarily obvious when single datasets are analyzed. Our results demonstrate that the discovery potential in these publicly available datasets is not fully realized. This article and the accompanying AsthmaMap website are a significant step toward realizing this potential.
The authors thank A. Regev, S. Wenzel, D. Sheppard, M. Selman, A. Choi, and D. A. Thompson for their helpful discussions and productive critiques.
The work of N.K., N.F., and N.N. was in part funded by NIH grant HL 073745. N.K. was also supported by a generous donation from the Simmons Family and by NIH grants HL0793941 and HL0894932. N.N. was also supported by a fellowship from the Maydan Foundation and by a Leibniz Research Center fellowship. This article has an online supplement, which is accessible from this issue's table of contents at www.atsjournals.org Originally Published in Press as DOI: 10.1165/rcmb.2007-0151OC on October 5, 2007 Conflict of Interest Statement: N.F. serves as a consultant to Agilent Technologies. None of the other authors has a financial relationship with a commercial entity that has an interest in the subject of this manuscript. Received in original form April 26, 2007 Accepted in final form August 29, 2007
This article has been cited by other articles:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||