Published ahead of print on January 23, 2003, doi:10.1165/rcmb.2002-0205OC
© 2003 American Thoracic Society DOI: 10.1165/rcmb.2002-0205OC ORFeome-Based Search of Airway Epithelial Cell-Specific Novel Human ß-Defensin GenesCenter for Comparative Respiratory Biology and Medicine, University of California at Davis, Davis, California Address correspondence to: Reen Wu, Ph.D., Center for Comparative Respiratory Biology and Medicine, Surge 1 Annex, Room 1121, University of California at Davis, One Shields Ave., Davis, CA 95616. E-mail: rwu{at}ucdavis.edu
ß-Defensin is one of the major host defense shields produced by various tissues and organs against microbial infection. To date, four human ß-defensins (DEFBs) gene products that share a consensus six-cysteine motif have been discovered. The hidden Markov model (HMM) profile was constructed from the common features of those known ß-defensin peptides to search for additional novel DEFB genes. A genome-wide search of the profile against ORFeome-based peptide databases (e.g., Ensembl project) led to the identification of six new DEFB members that also shared the conserved six-cysteine motif. Phylogenetic analysis supported a close relationship of these six new members with existing DEFB genes. Polymerase Chain Reaction studies of human tissue cDNA panels confirmed the expression of all six novel DEFB genes in various tissues. Two of them, DEFB106 and DEFB109, were expressed in the lung. A pilot study with cRNA probes for in situ hybridization and a synthetic propeptide for the functional characterization demonstrated the tissue-/cell-specific expression and the strong antimicrobial activity of DEFB106. These results support the utility of ORFeome-based HMM search in gene discovery for members of a specific gene family. The novel DEFB genes identified in this study may significantly contribute to overall antimicrobial host defenses.
Abbreviations: human ß-defensin, DEFB digoxygenin, DIG expressed sequence tag, EST glyceraldehyde-phosphate-dehydrogenase, GAPDH hidden Markov model, HMM Luria borth, LB nonredundant, nr polymerase chain reaction, PCR saline sodium citrate, SSC
Antimicrobial peptides of multicellular organisms play an important role in innate defenses against microbial infection (1). Defensin is one of the two major vertebrate antimicrobial peptide families providing a chemical shield against a broad-spectrum of microorganism infection (for review, see Refs. 25). Human defensins are categorized to - and ß-defensin subfamilies based on the different organization of the six-cysteine motifs (6). Currently, the four human ß-defensins (DEFBs) that share a consensus six-cysteine motif have been assigned to this gene family, and they are all clustered on chromosome 8 (7). Human ß-defensin 1 (DEFB101) was isolated from human plasma (8) and is expressed in most epithelial cells (9). DEFB102 was originally isolated from human skin (10) and is highly expressed in the lung after proinflammatory induction (11, 12). DEFB103 was identified recently by several different approaches, including a genomics-based polymerase chain reaction (PCR) search and traditional peptide purification (1315). DEFB104, which is induced in lung epithelial cells by cytokines, was discovered through a direct BLAST search of the genome sequence on chromosome 8 (16). All of the known DEFBs are short cationic peptides (1). They share a conserved motif, which is composed of six spaced cysteines (C-X6-C-X4-C-X9-C-X6-CC). The spacing between cysteines is almost fixed except for DEFB104, which has three amino acids instead of four in the space between the second and third cysteine, and five amino acids instead of six in the space between the fourth and fifth cysteine (17). The DEFB coding sequence is organized into two exons. The N-terminus of the translation product is led by a signal peptide, which is eventually cut off to form a propeptide and then a mature peptide. It is possible that there are more DEFB genes yet to be discovered in the human genome, both because of the fact that ß-defensins exist as large families in species like cows and mice (2) and because of the high frequency of gene duplications within ß-defensin clusters (7, 18). One general approach for discovering new members of a gene family is to search the nucleotide databases for similar sequences of this gene family by the BLAST program from NCBI (http://www.ncbi.nlm.nih.gov/BLAST) (19). This was the primary approach that led to the discovery of DEFB103 and DEFB104. However, like many other gene families, members in DEFB gene family do not share much sequence similarity; rather, they only share some motifs that may be evolutionarily conserved and required for all of the members of this gene family to function. Because the BLAST algorithm focuses only on local sequence similarity, it may not recognize any motif-based similarity. This limitation can be overcome by the construction of sequence profiles, namely "hidden Markov models" (HMMs), which represent the summary of the specific features of a set of the sequences extracted from those known members of a given gene family (2024). Searching the database by using HMM is analogous to looking for the general "features" of those genes rather than just similar DNA sequences. This search method will be more robust and specific than the BLAST program. Using this HMM searching method, Schultz and coworkers (25) have discovered more than 1,000 putative new human small GTPase proteins. In this report, we will describe an alternative HMM search against ORFeome-based peptide databases (e.g., the Ensembl project), combined with EST data mining, to obtain more informative results, such as the expression nature and the complete open reading frame (ORF) for six-cysteine motifs rather than merely the segments of genomic DNA sequences. To date, we do not know of other studies which have used the Ensembl predicted human peptide database to find novel human ß-defensins. We believe that ORFeome-based HMM search will become more useful than data mining on genomic databases after the draft completion of the human genome project as more and more genes are predicted with various algorithms (26). In this communication, we have used this approach to identify six additional novel DEFB genes1. RT-PCR analysis with cDNA panels of various tissues confirms the expression nature of these genes. A pilot study with cRNA probes for in situ hybridization and for the synthesis of the propeptide corresponding to DEFB106, a member of these newly found DEFB genes, demonstrates the utility of this approach for finding new antimicrobial gene products with tissue-/cell-specific expression.
In Silico Cloning of the Novel Human ß-Defensin Genes We collected all the known ß-defensin peptide sequences, including those of human and other animal species, from the NCBI database. All of the sequences were selected and processed with BLAST2 (NCBI software program). Only the most representative sequences were preserved. These 59 known ß-defensin sequences were then aligned by the ClustalW program (LASERGENE package; DNASTAR Inc., Madison, WI). All of the NCBI nonredundant (nr) database (August 2001) and Ensembl predicted human peptide database (version 1.02, http://www.ensembl.org/, Ensembl Genome Browser) were downloaded to an in-house Linux computer. HMMER2.2 software was downloaded from Sean Eddy's Lab Home Page at the Washington University at St. Louis, MO (http://hmmer.wustl.edu/) and set up on the same Linux machine. A HMMER model was built based on the alignment data; the scores of the predicted sequences were calibrated with the HMMCALIBRATE program. The models were then used to search against the NCBI-nr and Ensembl-predicted peptide databases by HMMSEARCH. The default cutoff values (Eval ≤ 10) were used to filter the results. A manual evaluation of the list was performed to discard known peptides, hits without the six-cysteine motif, and redundant hits from the selection. The remaining hits were then searched against the Swiss-Prot and translated EMBL protein databases by the Smith-Waterman algorithm of GCG/SeqWeb. After all known peptides were discarded, the remaining hits were searched using the TBLASTN program against the NCBI human EST database. The hits lacking significant scores were discarded. All remaining hits were deemed as novel members of the human ß-defensin gene family. This procedure is summarized in Figure 1.
Phylogenic Study For the purpose of propeptide alignment, predicted protein sequences were trimmed to exclude the signal peptide, which was based on the prediction by the program in SignalP WWW Server (http://www.cbs.dtu.dk/services/SignalP/). The sequences without the signal peptide were aligned using ClustalW from the LASERGENE package. The comparison matrix was set at Gonnet Series protein weight matrix with a gap penalty of 10 and a gap length penalty of 0.2.
Genomic Structure and Localization
Expression Analysis
In Situ Hybridization Human airway tissues were obtained from the UC Davis Medical Center with patient consent; the UC Davis Human Subject Review Committee approved and periodically reviewed this process. The tissues were fixed in formalin and processed for paraffin blocks. Paraffin sections of these tissues were placed on glass slides and hybridized in prehybridization solution using Digoxigenin (DIG)-labeled antisense or sense probes. These probes were synthesized by in vitro transcription (DIG RNA Labeling Kit; Roche Molecular Biochemicals, Indianapolis, IN) of full-length DEFB106 cDNA from EST AW103145 (InVitrogen Corporation, Carlsbad, CA). In situ hybridization was performed as per the manufacturer's protocol (Roche Molecular Biochemicals). Briefly, slide sections were incubated with 10 µg/ml Proteinase K in 50 mM Tris-Cl (pH 8.0) and 50 mM ethylenediamine tetraacetic acid for 15 min at 37°C, rinsed twice in 1x phosphate-buffered saline, and then postfixed in 4% paraformaldehyde/phosphate-buffered saline for 5 min. Slide sections were blocked for 10 min by 0.25% acetic anhydride in 0.1 M triethanolamine (pH 8.0). For each slide section, 30 ng of DIG-labeled RNA probe in 50 µl of hybridization buffer was applied. The hybridization buffer contained 2x saline sodium citrate (SSC), 1x Denhardt's solution, 10% dextran sulfate, 50 mM phosphate buffer (pH 7.0), 50 mM dithiothreitol, 250 µg/ml yeast tRNA, 100 µg/ml poly A, and 500 µg/ml salmon-sperm DNA. The hybridization was performed overnight at 50°C in a humidified chamber. After hybridization, the section was washed twice at 37°C for 15 min each time with 2x SSC, then once for 30 min with 2x SSC containing 20 µg/ml RNaseA, twice for 15 min each time with 1x SSC, and two more times for 15 min each time with 0.25x SSC. After all of these washes, the slide was reacted with anti-DIG primary antibody conjugated with alkaline phosphatase. After several more washes with 1x SSC, the reacted probes in the slide were color developed with the Biotin Nucleic Acid Detection kit from Roche Molecular Biochemicals.
Antimicrobial Assay
Identification of Novel Human ß-Defensin Genes We have retrieved all of the 146 ß-defensin peptide sequences from NCBI protein database containing the keyword "ß-defensin." These sequences were parsed to keep all the nr sequences. There were 59 nr ß-defensin gene sequences distributed through many species including human, chimpanzee, monkey, mouse, rat, bovine, pig, sheep, goat, chicken, and turkey. All 59 peptide sequences, excluding the signal peptide sequences, were aligned together by ClustalW program. An HMM profile of the ß-defensin family was then built from these multiple alignments. After calibration, the HMM profile was used to search against NCBI-nr and Ensembl-predicted peptide databases: 95 hits were found from the search on NCBI-nr and 13 hits from the search on Ensembl. The full names of these hits were retrieved from NCBI database. A repeated evaluation of these hits was performed to remove hits of already identified peptides, hits from nonhuman species, hits without a six-cysteine motif, and all duplicate hits (Figure 1). The remaining eight hits were further analyzed by a search against Swiss-Prot and Translated EMBL protein databases by Smith-Waterman algorithm on GCG/SeqWeb to ensure the novelty of these peptides. One hit turned out to be the known DEFB103. The remaining seven hits had no significant similarity in amino acid sequence to those of the known ß-defensins other than the six-cysteine motif. All seven novel peptides were analyzed by TBLASTN against the human EST database from NCBI to verify gene transcription. One candidate gene that could not be linked to any human EST clone was eventually discarded. The final six candidates were further passed through phylogeny analysis, chromosomal localization analysis and gene expression analysis. This procedure is summarized in Figure 1, and the representative EST clones of these novel DEFBs, DEFB106, DEFB108, DEFB109, DEFB118, DEFB129, and DEFB131, are shown in Table 2. The naming of the newly identified genes in this study conforms to those previously predicted by Schutte and coworkers (28).
Homology to the Known DEFBs
The phylogenetic tree from the multiple sequence alignment (Figure 3) shows the relative relationship among all new DEFB peptides. Obviously DEFB109 is in the same clade with DEFB101, DEFB102, and DEFB103, whereas DEFB129 and DEFB131 are in another clade with DEFB104. Both DEFB108 and DEFB118 are close to DEFB104, but not within the clade of DEFB101/DEFB102. DEFB106 has the most remote relationship to the other DEFB genes. It is located at the root of the phylogenetic tree and could be the ancestor of all DEFBs.
Genomic Structure of the Novel DEFBs The genomic DNA organization of these newly found DEFBs was predicted according to the UCSC human genome draft (Figure 4). The four known DEFBs have two exons and one 4-kb intron. The BLAT search of the UCSC human genome draft revealed that three of these newly found DEFB genes (DEFB106, -108, -109) were clustered on chromosome 8p23.1. Both DEFB118 and DEFB129 were localized to chromosome 20q11.21 and 20p13, respectively. DEFB131 had an ambiguous chromosomal location that could not be clearly determined with the current BLAT search.
Tissue Expression Profiles Using panels of cDNA sets prepared commercially from various adult human tissues, PCR was performed to amplify DEFB messages with specific primer sets (Figure 5). The amplification of GAPDH message in these tissues was used as an internal control. All the ß-defensin PCR-amplified DNA products were isolated and nucleotide sequencing was performed to confirm the authenticity of these amplifications. Except for DEFB108 and DEFB109, whose putative intron information is not available, all of the primer pairs were designed to be intron spanning. The correct amplicon size and DNA sequencing results proved that there is no genomic DNA contamination in these cDNA preparations. These results further support the expression of the novel DEFB genes obtained from the genome-wide search. Interestingly, all of these novel DEFBs were expressed in testis (Figure 5). Except for DEFB108 and DEFB129, other tissues besides testis also expressed some of these newly found DEFBs. DEFB109 message was ubiquitously expressed in all human tissues tested, with relatively higher levels of expression in heart, brain, lung, liver, kidney, pancreas, testis, and ovary. DEFB131 was moderately expressed in the prostate and small intestine. DEFB118 message was detected in the pancreas, whereas DEFB106 was expressed in the lung. As a comparison, PCR was also performed on the previously known four DEFB genes in these tissue cDNA panels (Figure 5). DEFB101 was expressed ubiquitously in most tissues, whereas DEFB102 message was found in high abundance in thymus and lung, and in moderate abundance in placenta, liver, spleen, prostate, and ovary. DEFB103 message could be found only in testis, whereas DEFB104 message was highly expressed in testis and low but detectable in placenta, skeletal muscle, kidney, ovary, small intestine, and leukocytes. Table 3 summarizes the expression pattern of the newly identified six DEFBs in various adult human tissues.
Characterization of DEFB106 Expression in Human Lung Tissue Sections Among the newly found DEFB genes, only DEFB106 and DEFB109 were expressed in the lung. However, DEFB109 expression is not as tissue-specific as DEFB106. For this reason our pilot studies focused on the expression of DEFB106 in lung and on demonstrations of its antimicrobial activity. In situ hybridization demonstrated the presence of DEFB106 message in both the airway surface epithelial cells and the serous cells of airway submucosal glands (Figure 6). Interestingly, both the goblet cell type of the surface airway epithelium and the mucous cell type of airway submucosal glands (Figure 6B) did not express this message. The sense probe had very low hybridization activity in these tissue sections (data not shown).
Antimicrobial Activity of DEFB106 Propeptide To further validate this ORFeome-based search approach, we tested the antimicrobial activity of the synthetic DEFB106 propeptide (amino acids from 21 to 65) on a common Gram-negative bacteria, E. coli DH5 . As expected, the DEFB106 synthetic propeptide had a strong antimicrobial activity against E. coli DH5 (Figure 7). The bactericidal activity was dose-dependent. About 90% of the bacteria was killed by this peptide at a concentration of 12 µg/ml. In contrast, a randomized peptide with the same amino acid composition corresponding to DEFB106 propeptide at this concentration had no bactericidal activity. These results strongly support the antimicrobial nature of the synthetic DEFB106 propeptide.
In this paper, we describe the discovery of six novel DEFBs by applying a computational tool, the HMM search on predicted peptide databases. Compared with the nucleotide BLAST search, the HMM method is much more sensitive due to its summarizing nature. The key point for a successful HMM search lies in constructing the HMM profile. The general process of constructing the HMM profile for this study was detailed in MATERIALS AND METHODS and summarized in Figure 1. Generally, a set of known gene family members must be aligned and used as "seeds"; a computer program (HAMMER in this study) will then compute the frequencies of each amino acid position in the "seeds" to construct a HMM profile. To avoid under- or overrepresentation of the gene family, as many nonredundant sequences as possible must be chosen. Use of wrong sequences or failure to include informative sequences will generate incorrect or inadequate HMM profiles that would significantly decrease the reliability of the results of subsequent database searching. For this study, we chose the known ß-defensins across all species. A majority of them, in terms of bactericidal activity and the six-cysteine motif, have been experimentally verified. The inclusion of the sequences from other species improves the search specificity by considering potentially nonfunctional sequence-related variations. This highly restricted HMM profile allowed us to identify new genes that share the consensus six-cysteine motif and the cationic nature of the amino acid sequences that are known characteristics of the existing DEFB members. We also took advantage of the abundant information available in NCBI nr and Ensembl predicted peptide databases for the HMMER search. The Ensembl database not only has all known protein sequences but also has the predicted protein sequences from EST and genomic sequences, further increasing the power of screening strategies used in this study. We recognize some bias with this approach. The major issue is related to the incompleteness of the existing databases. The degree to which the current databases are complete is not known. Our main purpose in this study was to demonstrate the feasibility of using the approach, to test the strategy, and to explore for new ß-defensin genes expressed in airway tissues. Our studies are not designed to be exhaustive. When the algorithms for gene prediction become more accurate, the strategy of ORFeome-based searches will be more comprehensive. To prove the principle, these newly described genes were further analyzed by sequence alignment, phylogenetic analysis, chromosomal localization, expression pattern, and demonstration of antimicrobial activity. The sequence alignment demonstrated that these newly found DEFBs share the conserved six-cysteine motif and the cationic features of the peptide sequence of the currently known DEFB genes. The phylogenetic analysis provides further supporting evidence that these genes are members of the ß-defensin gene superfamily, similar to the known four DEFB genes. Chromosomal localization shows that three of the newly found DEFB genes are located at the chromosomal 8p23 locus. This locus also contains all known existing DEFB genes. The loci for the other two new DEFB genes are at chromosome 20p13 and 20q11. These results suggest that all these DEFB genes could have originated from a common ancestor through gene duplication. Because evolution is often indolent, more genes are believed to be generated through duplication and modification of existing genes to meet diverse microorganism challenges; these genes diverge and are altered, but still retain functions related to those of their ancestral genes. One should notice the selection of the DEFB109 gene by this search, despite the absence of information on the N-terminus. We tried to manually extend the 5' end information of this gene, but we could not find the translation start site (ATG) in the upstream region of this gene based on the information extracted from the genome draft and EST database. This problem should resolve in the future when more EST data is available or the human genome draft is more accurately assembled. The remaining five newly found DEFB genes have a proper translation start site and a translatable sequence with a proper termination codon. This information further supports the translation of these genes into proteins. To understand the function of these genes, we performed an analysis by PCR of the expression of these 10 members of the ß-defensin gene family in various tissues. Interestingly, except DEFB102, all these DEFB genes are expressed in testis. It is unclear why the testis needs so many different kinds of ß-defensins. This result may help to explain how rare microbial infection is in testis, despite the fact that there is known to be limited protection of this tissue by the acquired immune system (29). It has been reported that the epididymis-specific EP2 protein has a structure similar to that of ß-defensins (30). The sperm oocytebinding protein 3, SOB3, which also expresses in testis, is reported to have a sequence highly homologous to FALL-39 and CAP18, members of the cathelicidin antimicrobial peptide family in humans (31). An antimicrobial peptide gene with the conserved six-cysteine motif, Bin1b, was reported to be specifically expressed in the epididymis of rats (32). All these peptides appear to have different roles in the male reproductive system not limited to the antimicrobial activity. A similar notion is suggested for the functional roles of these DEFBs in testis. In addition to testis, four of the six newly found DEFBs are expressed in other tissues. The functions of these DEFBs in these tissues remain to be clarified. Like the existing DEFB101, DEFB109 is expressed in a variety of tissues, including the lung. DEFB106 is exclusively expressed in lung in addition to testis. Thus, there are at least four DEFBs expressed in the lung; these are DEFB101, DEFB102, DEFB109, and DEFB106. We could not find any PCR product for DEFB104 in the lung of this cDNA tissue panel. This contrasts with a recent report that showed the induction of DEFB104 message in airway epithelial cells by bacterial products and phorbol 12-myristate 13-acetate (PMA) (17). The condition for the collection of human tissues to establish the cDNA panel is currently unavailable. It is possible that the lung tissue sample was obtained from a donor lacking bacterial infection or a PMA-equivalent treatment. Further study of the regulation of the expression of these genes by cytokines and bacterial products is needed to resolve the discrepancy between this study and the other report. To further support the utility of this ORFeome-based search, we chose DEFB106 for additional study by in situ hybridization and to assess its antimicrobial activity, because its expression was observed specifically in lung. In situ hybridization demonstrated abundant DEFB106 message in the airway serous cells of submucosal glands and nongoblet cells of the surface epithelia. This is similar to what was found for DEFB102 message (33). These results are also consistent with the notion that nonmucus-secreting cell types in the airway surface epithelia and the submucosal gland are responsible for the secretion of bactericidal proteins (such as lysozyme in addition to DEFBs), and also for the secretion of a variety of nonmucus-related substances, such as proteinase inhibitors (e.g., SLPI), iron-chelating agents (e.g., lactoferrin), and ion/water transports that regulate the homeostasis of airway lumen (34, 35), whereas mucous and goblet cell types are responsible for the secretion of large molecular weight mucins that can trap inhaled particulates and microorganisms. We also demonstrated that the DEFB106 propeptide had an excellent antimicrobial activity. The LD90 (the dose that achieves 90% reduction of colony-forming units) for DEFB102 in killing the gram-negative bacteria E. coli is near 10 µg/ml. We found that the LD90 for DEFB106 is also near 10 µg/ml. Further characterization of the spectrum of antimicrobial activity to determine the specific role of DEFB106 in host defense is warranted.
While this manuscript was prepared, a paper was recently published that also described the genome-wide search of ß-defensin genes (28). Their data was consistent with this study, which further supports the utility of HMM search. The computational methods of the approach in both studies were similar, but the strategy and the databases used for the screening were different. In their study, Schutte and colleagues (28) screened six-frame translated human genomic sequences and located 28 new human ß-defensin genes, which significantly exceeds the number we found. However, the newly identified DEFB genes in their study are partial genomic DNA sequences, which cover the six-cysteine motif only; there is no experimental evidence to verify whether these genes are expressed at all or associated with antimicrobial activity. The ORFeome-based search is more informative with a certainty on the expression nature of these novel genes. A possible source of bias in their approach came from the use of highly noisy genomic data. Thus, as discussed by the authors, some uninformative hits might have been found in their initial screening, which reduces the accuracy of their method. Further examination of these 28 putative DEFBs in their report reveals several unusual candidates with amino acid spacing between the cysteine residues of the six-cysteine motif far different from the known spacing in well-known ß-defensin genes. For instance, the "DEFB107" candidate gene described in their paper has 21 amino acids instead of six in spacing between the first and the second cysteine. Because the spacing and the specificity of six-cysteine motif will differentiate the 3-D structure and the biological activity of ß-defensin gene from the In summary, the current studies describe a novel computational gene discovery strategy to identify six new human ß-defensin genes. Preliminary studies using gram-negative bacteria have confirmed antimicrobial activity in one of newly identified genes, DEFB106, which is specifically expressed in airway tissue and testis. These experimental results not only prove the power of ORFeome-based HMM search, but also the value of this approach in identifying tissue-specific DEFB genes.
This manuscript is supported in part by NIH grants (HL35635, ES06230, ES09701, AI50496, ES04699 and ES05707) and the California Tobacco-Related Disease Research Program (10RT-0262). Drs. Cheryl Soref and Carroll E. Cross are thanked for their critical review and editing of this manuscript before submission. The authors also thank Andrew Last for his professional editing service on this manuscript.
1 The nucleotide sequences reported in this paper have been submitted to the GenBank DataBase with the accession numbers AF529413, AF529414, AF529415, AF529416, and AF529417. We follow the recent recommendation regarding ß-defensin gene nomenclature (http://www.pnas.org/cgi/doi/10.1073/pnas.222517899). Received in original form October 3, 2002 Received in final form January 10, 2003
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||