Published ahead of print on January 6, 2006, doi:10.1165/rcmb.2005-0404OC
© 2006 American Thoracic Society DOI: 10.1165/rcmb.2005-0404OC Novel Polymorphisms in the Myosin Light Chain Kinase Gene Confer Risk for Acute Lung InjuryDivision of Pulmonary and Critical Care Medicine and the Center for Translational Respiratory Medicine, Bayview Medical Center Genetic and Genomic Research Facility, Department of Epidemiology, The Genetic Resources Core Facility, Johns Hopkins University, and University of Maryland School of Medicine, Baltimore, Maryland; Division of Pulmonary and Critical Care Medicine, Medical College of Wisconsin, Milwaukee, Wisconsin; Division of Pulmonary and Critical Care Medicine, Emory University School of Medicine, Atlanta, Georgia; University of Tennessee, Memphis, Tennessee; and Department of Anthropology, Penn State University, State College, Pennsylvania Correspondence and requests for reprints should be addressed to Joe G. N. Garcia, M.D., Chairman, Department of Medicine, University of Chicago Pritzker School of Medicine, 5841 S. Maryland Avenue, W604, Chicago, IL 60637. E-mail: jgarcia{at}medicine.bsd.uchicago.edu
The genetic basis of acute lung injury (ALI) is poorly understood. The myosin light chain kinase (MYLK) gene encodes the nonmuscle myosin light chain kinase isoform, a multifunctional protein involved in the inflammatory response (apoptosis, vascular permeability, leukocyte diapedesis). To examine MYLK as a novel candidate gene in sepsis-associated ALI, we sequenced exons, exonintron boundaries, and 2 kb of 5' UTR of the MYLK, which revealed 51 single-nucleotide polymorphisms (SNPs). Potential association of 28 MYLK SNPs with sepsis-associated ALI were evaluated in a case-control sample of 288 European American subjects (EAs) with sepsis alone, subjects with sepsis-associated ALI, or healthy control subjects, and a sample population of 158 African American subjects (AAs) with sepsis and ALI. Significant single locus associations in EAs were observed between four MYLK SNPs and the sepsis phenotype (P < 0.001), with an additional SNP associated with the ALI phenotype (P = 0.03). A significant association of a single SNP (identical to the SNP identified in EAs) was observed in AAs with sepsis (P = 0.002) and with ALI (P = 0.01). Three sepsis risk-conferring haplotypes in EAs were defined downstream of start codon of smooth muscle MYLK isoform, a region containing putative regulatory elements (P < 0.001). In contrast, multiple haplotypic analyses revealed an ALI-specific, risk-conferring haplotype at 5' of the MYLK gene in both European and African Americans and an additional 3' region haplotype only in African Americans. These data strongly implicate MYLK genetic variants to confer increased risk of sepsis and sepsis-associated ALI.
Key Words: MYLK/MLCK genetic association SNP ALI sepsis
Acute lung injury (ALI) is characterized by profound inflammation, increased vascular permeability, and alveolar flooding, a combination of events which frequently results in acute respiratory failure. The incidence of ALI in the United States (1764 per 100,000 person-years) is higher than in other developed countries, with mortality rates for patients with acute respiratory distress syndrome, the more severe form of ALI, ranging from 3458% (1, 2). The risk of ALI appears to be disproportionately higher in African Americans, an observation which cannot be explained by socioeconomic factors alone (3), suggesting a genetic influence on susceptibility and outcome. Studies on the genetic basis of sepsis, the most common predisposing condition leading to ALI, are limited but have elucidated several candidates, including a single-nucleotide polymorphism (SNP) in the TNF promoter (308) (4) and a promoter polymorphism in the CD14 gene (260) (5). A single variant (insertion/deletion) in the gene encoding angiotensin-converting enzyme (ACE) is associated with ALI with patients homozygous for the deletion (and therefore carriers of the ACE DD genotype) appearing to be at high risk (6). Similarly, an SNP in the surfactant protein-B gene (1,580 C/T) is a risk factor for acute respiratory distress syndrome (79). Although intriguing, these studies are limited by a focus on single variants within a candidate gene, the absence of adequate methodologies, and the complete lack of validation in replicate independent populations. Despite these caveats, it remains widely believed that the identification of genetic polymorphisms in candidate genes may provide new insight into the molecular pathogenesis of sepsis and ALI and lead to the development of new diagnostic and therapeutic targets. The obvious absence of available families with a history of ALI has precluded linkage analysis approaches for examining the genetic basis of ALI. As a result, studies have used the candidate gene approach based on extensive expression profiling, the selection of putative candidate genes emanating from pathway analysis, or reports of similar association in related disorders. Based upon a systematic interrogation of endothelial barrier properties under conditions of lung inflammation, a defining feature of sepsis and ALI, we have speculated that the gene encoding human myosin light chain kinase (MLCK), spanning 217 kb on chromosome 3q21, may represent a viable candidate gene involved in ALI susceptibility and disease. The human MYLK gene encodes 3 proteins within a single gene, including the nonmuscle and smooth muscle MLCK isoforms (1012). In addition, using a separate promoter in an intron in the 3' region, MYLK encodes telokin, a small protein identical in sequence to the C-terminus of MYLK that is independently expressed in smooth muscle and functions to stabilize unphosphorylated myosin filaments (13). A pseudogene is located on the p arm of chromosome 3 (14). Seven transcript variants that produce seven isoforms of the calcium/calmodulin-dependent enzyme, as well as two transcripts that produce two isoforms of telokin, have been identified. We have previously demonstrated that the nonmuscle MLCK isoform encoded by MYLK is a multifunctional protein centrally involved in multiple aspects of the inflammatory response, including apoptosis, vascular barrier regulation and permeability, and leukocyte diapedesis (1518). MLCK is a molecular target in ventilator-associated lung injury (19), and in vivo studies in MYLK mutant mice demonstrate an essential role for MYLK in murine sepsis, again implicating MYLK as a potential drug-discovery target (20). Despite this compelling rationale, common variants in MYLK have yet to be identified and directly implicated as a major causal allele.
To characterize the functional role of the MYLK gene as a potential ALI candidate, we performed direct sequencing of the MYLK gene, which contains 32 exons, exonintron boundaries (including 100 bases of intronic sequence on either side), and 2 kb of the 5' UTR in 36 subjects (European Americans [EAs] and African Americans [AAs]) with sepsis, sepsis-associated ALI, or healthy control subjects and identified 51 SNPs. We combined our SNP discovery and data available from public resources to construct two sets of biallelic markers with minor allelic frequency (MAF)
Human MYLK Gene Sequencing and Polymorphism Analysis Genomic DNA was extracted and purified from lymphocyte buffy coats removed from 20 ml of EDTA-treated blood using a commercial kit (PUREGENE; Gentra Systems, Inc., Minneapolis, MN). DNA was stored at 20°C in tris-EDTA buffer until further use. The full-length MYLK gene (GeneBank Accession no.: U48959; 217.6 kb containing 32 exons) (11) was assessed by direct sequencing of PCR amplicons using individual DNA samples from 36 subjects with either ALI (n = 12) or sepsis (n = 12) or who were healthy control subjects (n = 12) comprised equally of EAs and AAs. PCR primers were designed to amplify exons, exonintron boundaries (including 100 bases of intronic sequence on either side), 3' UTR, and 2 kb upstream of 5' UTR. Primers were synthesized on the ABI 3948 DNA synthesizer (ABI, Foster City, CA). Sequencing was performed on an ABI 3700 sequencer, following standard protocols (21, 22), at the DNA Analysis Facility, Johns Hopkins University (Baltimore, MD). SNPs were identified by manual inspection using Sequencher 4.1 (Gene Codes Corp., Ann Arbor, MI). Representative sequence of gene encoding human EC MYLK with GenBank accession no. U48959.2 (GI:7239695) was used as reference sequence for the numbering of residues. Nucleotide numbering uses the A of the ATG translation initiation start site as nucleotide +1 for coding SNPs. Positions are given in the corresponding intron/exon based on U48959.2, and the reference intron sequence can be found in genomic contig NT_005543 for intronic variations. Positions for SNPs in 5' UTR are counted upstream from the ATG, which is found in exon 2.
Patient Cohort Recruitment and Demographics
SNP Genotyping Genotyping was performed using a 5' nuclease Taqman allelic discrimination assay (Applied Biosystems, Foster City, CA) on the 7900HT Sequence Detection System, which can detect different forms of the same gene that differ by nucleotide substitution. To account for differences in minor allele frequencies (MAF) according to ethnicity, two sets of markers with a MAF 10% (the exception being four SNPs in the EA panel used for LD testing with MAF between 6 and 8%) were selected to screen the entire MYLK gene. EAs and AAs were analyzed separately. First, we selected SNPs available from the ABI Taqman SNP Genotyping Assay list (Assays-on-Demand, AOD) according to MAF and relative location between SNP to build the frame of the dense SNP map of the gene. Novel SNPs identified by direct sequencing were next selected to reduce the gaps using the ABI Customized Taqman SNP Genotyping Assay (Assays-by-Design) service or dbSNP (NCBI) for one additional SNP (rs9829784, MYLK_P1). Primers and probe sets were designed on the basis of types of base change and flanking sequences of each SNP. The EA set contained 28 SNP markers (15 Taqman assays and 13 novel SNPs) spanning a total of 214.7 kb of sequence on human chromosome 3 with an average inter-SNP distance of 8 kb. The AA set contained 25 SNP markers (17 Taqman assays and 8 novel SNPs) spanning a total of 204 kb with an average distance of 8.5 kb. A total of 17 SNPs were common markers, with an MAF > 10% in both ethnic groups. Genotyping of 7% of samples was repeated as the quality control procedure.
Test for Population Structure
Statistical Analysis
Patient Characteristics The primary population was a European American dataset, which included (1) patients with sepsis-associated ALI (n = 92), (2) patients with sepsis alone (n = 114), and (3) healthy control subjects (n = 85). A secondary population consisted of a cohort of 158 African-American patients also comprised of patients with sepsis (n = 51), patients with sepsis-induced ALI (n = 46), and control subjects (n = 61) (Table 1). All sepsis subjects enrolled had either severe sepsis or septic shock. There were no significant differences in co-morbid factors between the case patients and the control subjects. Age and APACHE II scores were not significantly different between the two populations (European and African American) within a diagnosis, although the age of the control subjects in both groups was significantly different compared with the other groups (P < 0.01). Predictably, survival rates were significantly reduced in ALI cohorts versus sepsis subjects in both ethnic groups (P = 0.04).
Identification of Novel Polymorphisms in the MYLK Gene
Fine Scale Mapping and Intragenic LD Patterns in the MYLK Gene We selected a set of 36 SNPs (ABI SNP Genotyping list, 21 assays; SNP discovery effort, 15 customized assays) based on (1) gene location, (2) relative distance to each other, (3) minor allele frequency, and (4) compatibility with the genotyping method employed in two subgroups (EA and AA). Priorities were given to SNPs in coding regions causing amino acid changes and SNPs only found in cases by discovery effort. The selected markers did not demonstrate significant departure from Hardy-Weinberg equilibrium (HWE) in either ethnic group. The EA set contained 28 SNP markers (13 novel SNPs from SNP discovery) spanning a total of 214.7 kb on chromosome 3, resulting in an average 8-kb distance between SNPs. Four SNPs exhibited MAFs between 6 and 8%, and were not used for genetic association testing (Table 2). The AA set contained 25 SNP markers (8 novel SNPs) spanning 204 kb with an average inter-SNP distance of 8.5 kb (Table E2). The D' measure of pairwise LD was estimated between SNPs separately for control subjects from the two ethnic groups, and LD blocks were constructed using confidence intervals (29). Six distinct LD blocks were seen in EAs (estimated from 170 normal control chromosomes) (Figure E2A). Block 3 and block 4 were merged to construct a 5-kb block that encompassed exon17 (the second exon encoding smooth muscle MLCK isoform) and surrounding intronic regions (containing SNPs rs820336, rs33262, MYLK_024, MYLK_025, rs33264, and rs11717814). Only two distinct LD blocks were apparent in AAs (120 chromosomes), with two additional blocks identified exhibiting MAF for key variants that were excessively reduced to meet HaploView power requirements (Figure E2B). The first distinct block in AAs also included exon17 and the upstream intron SNPs (rs820336, rs33262, MYLK_024). To compare LD patterns of MYLK between EAs and AAs, we examined pairwise LD between 17 SNP markers common to both sample subgroups with MAF 10% (Figure 1B). The MYLK region displayed high levels of pairwise LD between neighboring SNPs and relatively low haplotype diversity in EAs (Figure 1B, left panel), with three distinct LD blocks (block1 = rs820336, rs33262, MYLK_024, MYLK_025; block2 = rs702032, rs1254392, MYLK_036; block3 = rs3845915, MYLK_037). In contrast, only two distinct LD blocks were seen in AAs (Figure 1B, right panel), but these overlapped with blocks found among EAs (block1-rs820336, rs33262, MYLK_024; block2- rs820447, rs3845915). Despite low levels of pairwise LD between neighboring SNPs, haplotype diversity is high compared with that seen in EAs. Analysis of 60 randomly selected EA control subjects (to approximate the number of AA subjects) failed to reveal any significant change in LD patterns described above, suggesting that the differences in sample size do not explain this difference (results not shown). Interestingly, the common allele among EAs was quite rare among AAs for eight MYLK SNPs (Table E2).
Single-Locus Tests for Association between MYLK Polymorphisms, Sepsis, and ALI
We confirmed these associations in AAs for this particular SNP-rs820336 where there appeared to be increased risk for both sepsis ("A" allele for rs820336: OR, 2.46; 95% CI, 1.324.60, P = 0.002) and ALI (OR, 2.07; 95% CI, 1.093.95, P = 0.02) phenotypes. The "AA" genotype is rare in the African American control subjects (1.7%), compared to the "GG" phenotype (55.8%). However, the "AA" genotype frequency significantly increased in both the sepsis (19.6%) and ALI (23.9%) groups; carriers of the "AA" genotype have 18-fold increased risk for both diseases (recessive model) (Table E3). There was evidence for association between ALI and 1 additional SNP (hcv1602689: OR, 3.5; 95% CI, 1.1212.90, P = 0.01) (Figure E2). Carriers of mutant "GG" genotype at MYLK_037 (OR, 2.40; 95% CI, 0.975.91, P = 0.05) also showed a trend for association with ALI only among AAs. No significant association was observed between MYLK variants studied and either severity or outcome of disease at the genotypic level.
Adjusting for Population Stratification in African Americans
Haplotype Analysis using Sliding Window Approach
We first reported the full-length sequence of the MYLK gene spanning 217.6 kb on chromosome 3q21.1 (10, 11) and containing three putative promoter regions and 32 exons which encode the smooth muscle and nonmuscle MLCK isoforms that phosphorylate regulatory myosin light chains. Human nonmuscle cells, such as vascular endothelial cells (EC), only express the nonmuscle MLCK isoform, which contains a novel NH2-terminus stretch (amino acid 1922) not present in the open reading frame of smooth muscle MLCK (11, 12). Both the nonmuscle and smooth muscle isoforms are post-translationally modified by phosphorylation (11, 3032) with the novel N-terminal stretch of the nonmuscle isoform, a prominent site of phosphorylation by p60src (33). We have explored participation of the nonmuscle MYLK isoform in lung innate immunity and inflammatory responses and determined key involvement in regulating barrier function, fluid flow, inflammatory cell trafficking, creating new blood vessels, and in vascular cell apoptosis (1518, 3437). For example, the role of MLCK in inflammatory lung edema formation involves carefully orchestrated MLCK-dependent actomyosin-based cytoskeletal rearrangement (16). MLCK inhibition prevents the increased lung permeability produced by thrombin (15, 16), TGF- 1 (38), activated PMNs (39), ischemia/reperfusion injury (40), and by mechanical stress (19). Recently, selective nmMYLK knockout mice demonstrate an essential role for MYLK in susceptibility to sepsis-induced ALI and as a potential drug discovery target (20). Furthermore, the chromosome location of MYLK (3q21) is an active site for several inflammatory disorders including asthma, allergic rhinitis, COPD and atopic dermatitis (Genetic Linkage Map: http://www.grc.nia.nih.gov/branches/rrb/dna/chromosome3.htm). Despite the apparent clinical importance of this multifunctional enzyme, the role of MYLK as a candidate gene in ALI has remained largely unexplored. The goal of the current study was to evaluate MYLK as a potential candidate gene and drug target for sepsis and ALI and to identify genetic variants in MYLK conferring risk of sepsis and/or ALI using simple case/control samples. A frequent systematic error in molecular epidemiologic studies involves imperfect sampling or classification procedures (41), a particular concern given the heterogeneous major risk factors in ALI (sepsis, multiple transfusions, trauma, pneumonia, burns, cardiopulmonary bypass, and pancreatitis) (42). We used clearly defined inclusion criteria for cases and control subjects, with all patients with ALI recruited developing severe physiologic derangements characteristic of ALI in the context of documented sepsis. We identified 51 SNPs among patients with ALI, patients with sepsis, or control subjects and assessed the importance of MYLK variations on the risk for sepsis and sepsis-induced ALI. Significant associations were observed between four MYLK SNPs and the sepsis phenotype in EAs by single-locus analyses. The associations in EAs of SNP rs820336 which located at the first intron of the smooth muscle isoform of the gene with increased risks of both sepsis and ALI were confirmed in a second AA population. We believe it likely that there are one or more functional MYLK variants at the neighboring region of this particular SNP responsible for susceptibility to both sepsis and ALI regardless of ethnicity. Studies to further characterize this region within MYLK are ongoing. For the particular replicated SNP rs820336 (A > G), frequencies of the "AA" homozygote in EAs are higher (60.7% in control subjects, 47.3% in patients with sepsis, and 45.5% in patients with ALI) compared with the frequencies of mutant "GG" homozygote in the three subgroups (3.6%, 17.3%, and 13.6%, respectively). Carriers of the "GG" genotype have over 5-fold increased risk for sepsis and ALI. In contrast, the genotype frequencies were reversed in AAs: only 1.7% of the AA control subjects carried the "AA" genotype, as compared with 55.8% who carried the "GG" genotype. However, carriers of the "AA" genotype have 18-fold increased risk for both sepsis and ALI. It is not unusual for marker allele and haplotype frequencies to show considerable variability across populations, such that a "major" allele in one population is the "minor" allele in the other population. Given such frequency differences, the high-risk allele/haplotype can easily switch across populations. The low occurrence of "AA" in African Americans could be due to the European admixture, and more common interactions with other genetic or environmental risk factors in AAs are likely to account for the greater risk in this group (43). Our haplotypic analysis results confirmed the findings from the single-locus analysis and further increased the power to discover regions in the N-terminal of the MYLK gene that specifically contribute to susceptibility to ALI in both ethnic groups. These results suggest the defined regions or nearby region may harbor causal variants that confer susceptibility to ALI. The CAC haplotype, which exists only in AAs (comprising SNPs hcv1602689-AOD29, MYLK_007, rs11707609-AOD24), is of particular interest. MYLK_007, which maps to exon 2 containing the transcription initiation site, is a nonsynonymous SNP conferring proline to histidine amino acid change. The other nonsynonymous coding SNPsproline to serine (MYLK_002) and valine to alanine (MYLK_003)were also associated with both phenotypes in AA, and, as with MYLK_007, require the functional consequence to be elucidated. These coding SNPs suggest the potential for major conformation changes in the enzyme either affecting enzymatic activity or interactions with other regulatory proteins such as p60Src (33) and macrophage migration inhibitory factor (MIF), a well accepted biomarker for ALI (44). We previously demonstrated that p60Src-mediated phosphorylation of two key tyrosine residues in the N-terminus results in 3-fold enhancement of MLC kinase activity (33). Mechanistic in vitro biochemical and cellular studies and in vivo knock-in transgenic animal studies will be needed to fully understand the ramifications of these SNPs in the context of ALI pathophysiology and the racial disparity which exists in disease morbidity and mortality. Linkage disequilibrium is a complex function of a number of genetic and evolutionary factors (mutation, recombination, gene conversion rates), demographic and selective events, and the age of the mutation itself (45). Both the boundaries of haplotype blocks and the specific haplotype observed are shared to a remarkable extent across populations. It has been suggested (29) that initial haplotype mapping in populations with longer-range LD might serve to make initial localization more efficient. The "tag" SNPs can be further selected and can be used in other studies, substantially reduce the time and effort for genotyping without losing significant haplotype information. In EAs, the MYLK gene region exhibits high levels of pairwise LD between neighboring SNPs and relatively low haplotype diversity, with 5- to 6-LD blocks accounting for the vast majority of EA chromosomes. The AA population is the result of relatively recent admixture between the European American and African populations, which is the potential resource of extended LD (46). Correlation between extended intervals of LD and functional genomic elements was observed (47). Both the CAG haplotype (28.8 kb) and the CTA haplotypes (19.8 kb) that associated with ALI in our AA samples are quite extended compared with findings in the EAs. A small set of AAs replicated and supported our findings in EAs and added additional information that may contribute to the racial disparity in disease susceptibility and severity. We do not believe our findings can be explained by an admixture resulting in population stratification. However, it is also important to test and control for the genetic structure present in our AA population to avoid false positives. Since sepsis and sepsis-associated ALI predominantly affects middle-aged adults, recruitment of relatives is difficult and eliminated the possibility of using family-based design. An alternative approach involves using a set of unlinked genetic markers to infer details of population structure, and estimation of the ancestry of sampled individuals (48). Ancestry informative markers (AIMs) are genetic loci with alleles that have high frequency difference between populations defined for and specific to a particular admixture mapping application. AIMs can be used to estimate ancestry at the level of the population, subgroup (e.g., disease cases and control subjects), and individual (49). These studies confirmed the validity of not adjusting for population stratification. In summary, our results, involving association testing in two case-control designed populations, are consistent with the notion that case-control association studies are a useful tool for shedding light on the genetic basis of disease predisposition and outcome. Our study used single-locus and haplotypic analyses using SNP markers across the entire MYLK gene to provide valuable information in terms of study design, haplotypic analysis approach, and the requirement for adjusting of stratification in AAs. Significant single-locus and haplotypic associations were observed in EA and AA populations between MYLK SNPs and both the sepsis and sepsis induced ALI phenotypes. In addition, results derived from multiple haplotypic analyses revealed a ALI-specific, risk-conferring haplotype at both the 5' region of the nonmuscle MYLK gene in EAs and AAs and an additional haplotype within the 3' region of the gene only in AAs. Consistent with the role of MYLK as a key modulator of inflammatory responses and a potential drug target, these data strongly implicate genetic variants in MYLK that confer increased risk of both sepsis and sepsis-associated ALI. The MYLK gene resides in 3q21, a genomic locale significantly associated with several inflammatory disorders, including asthma, allergic rhinitis, chronic obstructive pulmonary disease, and atopic dermatitis, suggesting that MYLK may represent a viable candidate gene in other inflammatory disorders. While a potential weakness of our study is the relatively small sample size and confirmatory studies (using greater populations) with fine scale mapping in defined MYLK regions are needed, these results provide needed validation of the candidate gene approach in complex disorders where family-based studies are not feasible.
The authors are grateful to all the study coordinators for recruitment of subjects into Consortium to Evaluate Lung Edema Genetics (CELEG), and thank Dr. Daniele Fallin for statistical input, William Shao and Mohan Parigi for information technology support, and Nancy Cox, Ph.D. (University of Chicago) for helpful discussions.
This work was supported by a NHLBI Program Project grant (HL 58064), the HopGene Program in Genomic Applications (U01 HL66583), and a Specialized Center for Clinically-Oriented Research (SCCOR) award (HL 73994). L.G. is supported in part by NIH T32 training grant. This article has an online supplement, which is accessible from this issue's table of contents at www.atsjournals.org Originally Published in Press as DOI: 10.1165/rcmb.2005-0404OC on January 6, 2006 Conflict of Interest Statement: None of the authors has a financial relationship with a commercial entity that has an interest in the subject of this manuscript. Received in original form October 28, 2005 Accepted in final form December 5, 2005
This article has been cited by other articles:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||