論文ID: CJ-14-0628
Background: Mutations in at least 30 genes have been linked to hypertrophic cardiomyopathy (HCM). Due to the large size of the main HCM genes, Sanger sequencing is labor intensive and expensive. The purpose was to develop a next-generation sequencing (NGS) procedure for the main HCM genes.
Methods and Results: Multiplex amplification of the coding exons of MYH7, MYBPC3, TNNT2, TNNI3, ACTC1, TNNC1, MYL2, MYL3, and TPM1 was designated, followed by NGS with the Ion Torrent PGM (Life Technologies). A total of 8 pools containing DNA from HCM patients were sequenced in a 2-step approach. First, a total of 60 patients (validation cohort) underwent both PGM and Sanger sequencing for the 9 genes. No false-negative variants were found on NGS (100% sensitivity), and a specificity of 97% and 80% was achieved for single-nucleotide and insertion/deletion variants, respectively. Second, the PGM was used to search for mutations in a total of 76 cases not previously studied (discovery cohort). A total of 19 putative mutations were identified in the discovery pools, which were confirmed and assigned to specific patients on Sanger sequencing.
Conclusions: An NGS procedure has been developed for the main sarcomeric genes that would facilitate the screening of large cohorts of patients. In addition, this procedure would facilitate the uncovering of rare gene variants on a population scale.
Mutations in at least 30 different genes have been found in patients with hypertrophic cardiomyopathy (HCM), with MYBPC3 and MYH7 accounting for approximately 50% of the mutations.1–4 Due to the large size of these genes, the Sanger sequencing of single amplicons is labor intensive and expensive. Next-generation sequencing (NGS) technologies could facilitate the genetic screening of the HCM genes in large cohorts of patients.5,6 Most of the reported NGS procedures are based on the polymerase chain reaction (PCR) amplification of the coding exons from each patient with primers that matched the flanking introns, followed by the pooling and digestion of the PCR products to achieve a readable size (commonly, <200 bp) and the ligation of a specific oligonucleotide (barcode) to each fragment.7–9 Because each patient can be recognized via the barcode, it is possible to sequence many different patients in a single array. In practice, this means that NGS of a large number of patients would require many different PCR and barcoding assays. One way to reduce the experimental time required and the cost would be to perform multiplex amplification (all the target sequences in a few tubes) of DNA pools. The putative mutations found in a pool could be further assigned to a specific individual by Sanger sequencing of the corresponding exon in all the individuals used to create the pool (Figure 1). In spite of the labor- and cost-saving of this approach, a main limitation of the NGS of DNA pools is that rare nucleotide variants could be diluted by the wild-type allele to a level too low to be detected (false negatives).10 Other authors, however, considered this a valid approach to search for mutations in mendelian disorders.11
Flow-chart for the next-generation sequencing of DNA pools to characterize mutations in the main hypertrophic cardiomyopathy (HCM) genes.
Editorial p ????
The purpose of this study was to develop and validate a procedure for sequencing the most commonly mutated genes in HCM, based on 2-tube multiplex amplification of DNA pools and NGS with the Ion Torrent semi-conductor (non-optical) Personal Genome Machine (PGM; Life Technologies). This approach would facilitate the rapid and cost-effective search for rare DNA variants in large numbers of individuals.
This research, including the informed consent forms and procedures, was approved by the Ethics Committee of Hospital Universitario Central Asturias (HUCA). All the patients were Caucasian and from the region of Asturias (Northern Spain) and gave their written informed consent to participate in the study, which was recorded in the patient’s clinical history.
HUCA is a reference center for the genetic studies of HCM in Spain. The study involved a total of 136 HCM non-related index cases, recruited through the Cardiology Department of HUCA in the period 2001–2013. HCM was diagnosed based on clinical symptoms and left ventricular septum (LVS) >15 mm in the absence of any other condition that could explain the hypertrophy (such as hypertension). Patients with at least 1 relative who had also been diagnosed with HCM were defined as familial cases.
Patients were divided into 3 groups (Table 1): (1) validation Sanger to NGS (n=26), previously Sanger sequenced for the coding exons (plus at least 5 intronic flanking nucleotides) of MYH7, MYBPC3, TNNT2, TNNI3, ACTC1, TNNC1, MYL2, MYL3, and TPM1;10–12 (2) validation NGS to Sanger (n=34), first sequenced on NGS and further via Sanger for the 9 genes; or (3) discovery (n=76), patients not or partly sequenced who underwent NGS followed by Sanger sequencing of the exons containing putative mutations.
Characteristics | Mean±SD or n (%) |
---|---|
Mean age at diagnosis (years) | 48±13 |
Range | 19–76 |
Male | 81 (60) |
HCM history | 58 (43) |
Mean BMI | |
Male | 26±3 |
Female | 24±4 |
Mean IVS | 21±5 |
Mean PWT | 12±5 |
Mean LVWT | 32±6 |
Dyspnea | 90 (66) |
NYHA index | |
Class I–II | 63 (46) |
Class III–IV | 27 (20) |
Angina | 47 (35) |
Syncope | 37 (27) |
Atrial fibrillation | 31 (23) |
Arrhythmia (Holter monitoring) | 45 (33) |
LVOTO >30 mmHg | 49 (36) |
BMI, body mass index; HCM, hypertrophic cardiomyopathy; IVS, interventricular septum; LVOTO, left ventricular outflow tract obstruction; LVWT, left ventricular wall thickness; NYHA, New York Heart Association; PWT, posterior wall thickness.
DNA was obtained following a salting-out method, resuspended in water, and adjusted to a final concentration of 10 ng/µl using Real Time Taqman quantification with RNase P Detection Reagents (FAMTM; Life Technologies) in a 7500 Real Time PCR-System (Applied Biosystems). Using this procedure, we also confirmed that all the DNA were suitable for amplification.
Eight DNA pools containing 10 µl of the corresponding DNA were produced (Table S1): 1 pool consisted of 13 patients from the Sanger to NGS group, and had 1 unique nucleotide variant either mutation or polymorphism (control variants), that would thus be present with an allele frequency of 1/26 in the pool. Three pools (2–4) with 12–16 samples per pool consisted of 4 patients from the Sanger to NGS group who harbored a mutation (control variants) plus 34 patients from the NGS to Sanger group. Finally, 4 pools (A–D) with 20–25 samples per pool consisted of 12 patients from the Sanger to NGS group, plus 76 patients from the discovery group.
Multiplex (Ampliseq) AmplificationA 2-tube multiplex amplification for the coding sequence exons plus at least 5 intronic flanking nucleotides (approximately 16 kb) of MYH7, MYBPC3, TNNT2, TNNI3, ACTC1, TNNC1, MYL2, MYL3, and TPM1 genes was designated online (Ion AmpliSeqTM Designer; https://www.ampliseq.com). We compared several primer design options and ordered the 1 that gave the maximum target sequence coverage. Primer pairs to amplify a total of 176 fragments were provided by the manufacturer in only 2 tubes. The amplicons covered 99% of the target sequence (Tables S2,S3).
Each DNA pool was amplified with the Ion AmpliseqTM Library Kit in conjunction with Ion AmpliseqTM Custom Primer Pool protocols according to the manufacturer procedures (Life Technologies), and following the next steps: PCR in 2 tubes, partial digestion of the primers with FuPa Reagent, ligation of the barcode adapters (only for the discovering cohort pools), purification by Agencourt® AMPure® XP Reagent, PCR with the adapters using Platinum® PCR SuperMix High Fidelity enzyme (Invitrogen), purification by Agencourt® AMPure® XP Reagent, quantification of the sample (Agilent Bioanalyser Instrument and Qubit® 2.0 Fluorometer), and dilution of the sample to a final concentration of 20 pmol/L.
Template preparation, emulsion PCR, emulsion breaking, and enrichment were performed using the Ion PGMTM Template OT2 200 kit following the manufacturer instructions (Life Technologies). Briefly, a total of 10 ng of the DNA pool were amplified in 2 Ampliseq tubes using the Ion AmpliseqTM Library Kit. The reactions were quantified (Agilent Bioanalyzer) and then emulsion PCR was done using the Ion PGM template OT2 200 Kit and the Ion One-Touch instrument (Life Technologies). Template-positive spheres were recovered using Dynabeads MyOne Streptavidin C1 beads and quantified using the Ion SphereTM Quality Control Assay and the Qubit 2.0 fluorometer (Life Technologies).
We performed 3 massive parallel sequencing experiments: pool 1 was sequenced with an Ion Torrent 316 (100-Mb) array. Each of the NGS to Sanger (2–4) and discovery (A–D) pools were individually barcoded and sequenced in two 318 (1,000-Mb) arrays.
NGS was performed using the PGM 200 sequencing kit protocol in the Ion Torrent PGM. We used 260-flow runs, which support a template read length of approximately 200 bp. The number of samples used to create the pools was decided taking into account the load capacity of the array, the total length of the target sequences (approximately 16 kb), the dilution of a unique rare allele inside the pool, and the number of reads per amplicon necessary to achieve a theoretical minimum 50× coverage.
NGS Data AnalysisThe raw PGM data were processed with Torrent Suite v3.4.2 (Life Technologies) to generate sequence reads filtered by the pipeline software quality controls. Reads assembling and variant identification were done with Variant Caller (VC) v3.4.51874, using FastQ files containing sequence reads and the Ion Ampliseq Designer BED file software to map the amplicons. The Integrative Genome Viewer (IGV, Broad Institute) was used for the analysis of depth coverage, sequence quality, and variant identification. Variants were identified with the somatic sample VC default algorithm. We considered 3 types of reads: <20× coverage per allele, amplicons were discarded, and the corresponding exons Sanger sequenced in each individual; 20–50× coverage per allele, amplicons were considered admissible and the BAM files were visualized to confirm the read quality and confirm the nucleotide variants; and >50× coverage per allele, amplicons were considered optimal.
Because insertions/deletions (INDELS) are frequently non-detected by the PGM (and other NGS platforms) we performed a specific analysis to reduce the risk for non-detection of true INDELS (false negative). The somatic sample VC default algorithm was set to low sample coverage, minimum allele frequency and minimum variant frequencies of 10,000, 0.01, and 0.01 respectively. In addition, the BAM files of amplicons containing putative INDELS were visualized and those that mapped in only 1 strand were discarded.
Sanger Sequencing and Putative Mutation AssignmentNucleotide variants that fulfilled the following criteria were considered as putative mutations: had a functional effect (missense, nonsense or frameshifting amino acid changes; pre-mRNA splicing); reported in the Human Genome Mutation database; or classified as likely pathogenic on bioinformatics in silico prediction (Polyphen and SIFT). For each putative mutation in the 7 discovery pools the corresponding DNA were individually amplified and sequenced with BigDye chemistry using ABI3130 equipment (Life Technologies) to identify the mutation carrier. Briefly, the exon containing the nucleotide variant was amplified with primers that matched the flanking introns and PCR fragments were purified and sequenced.13,14
A DNA pool-based strategy was carried out, using custom target multiplex amplification for the main HCM genes (MYH7, MYBPC3, TNNT2, TNNI3, ACTC1, TNNC1, MYL2, MYL3, and TPM1) in only 2 tubes, followed by massive parallel sequencing in the Ion Torrent PGM sequencer. As a first validation step, a pool of DNA from 13 patients (pool 1) was amplified and PGM sequenced in the medium-capacity (100-Mb) 316 semiconductor chip. By processing a single pool we reduced the cost and labor time of individual amplification, barcoding, and library preparation of the 13 DNA.
The array load density was 73% with 344 Mb of nucleotide reads, a mean read length of 138 bp, and 98% of the amplicons having >100× coverage (Table S4). A total of 172 of the 176 amplicons (98%) had optimal reads, and only 4 (2%) had null or poor reads (Figure 2). Among the non-readable amplicons, 2 corresponded to exon 1 of TPM1, 1 to exon 12 of MYBPC3, and 1 to exon 6 of ACTC1. Because these failures were replicated in the 318 arrays and the sequences of the primer pairs were correct, we concluded that the absence of nucleotide reads for the 4 amplicons was likely due to some characteristic that made them refractory to amplification, such as a high GC content (3 of the 4 amplicons had a GC content >60%; Table S5).
Nucleotide reads for the amplicons of the 9 genes.
The VC identified a total of 45 single-nucleotide variants (SNV) in the validation pool 1 (Table S6). All the control variants in readable amplicons were detected at a threshold frequency >1% (allele frequency range, 3.35–5.42; Table 2). The VC also identified all the variants in readable amplicons previously found through Sanger sequencing of the 13 patients. We identified 2 MYH7 variants non-recognized in the Sanger sequencing: c.T136>C and c.T5345>A. These nucleotide changes were thus false positives. On visual inspection of the BAM files using IGV, c.T136>C overlapped 2 amplicons, and most of the c.5345A were in forward strand sequences (Figure S1).
Gene | Nucleotide position |
Exon/Intron | cDNA | Effect | Rare variant (%) |
---|---|---|---|---|---|
MYH7 | 23898994 | Exon 12 | c.1128C>T | p.D376D | 4.02 |
MYH7 | 23900093 | Intron 10 | c.895+17G>A | None | 4.43 |
MYBPC3 | 47358997 | Exon 24 | c.2547C>T | p.V849V | 4.01 |
MYBPC3 | 47360053 | Intron 22 | c.2308+18C>G | None | 3.54 |
MYBPC3 | 47365049 | Exon 13 | c.1217G>A | p.S406N | 3.76 |
MYBPC3 | 47370076 | Exon 6 | c.671_673delTGC | p.L224fs | 3.35 |
MYBPC3 | 47371598 | Exon 4 | c.472G>A | p.V158M | 3.85 |
TNNT2 | 201330429 | Exon 14 | c.758A>G | p.K260R | 4.49 |
TNNI3 | 55667958 | Intron 3 | c.150+13G>A | None | 4.88 |
TNNI3 | 55668992 | Exon 1 | c.–35C>A | None | 5.42 |
MYL2 | 111353556 | Exon 3 | c.132T>C | p.I44I | 3.78 |
MYL3 | 46902491 | Intron 1 | c.130–14G>T | None | 3.89 |
TPM1 | 63353098 | Exon 5 | c.523G>A | p.D175N | 5.13 |
NGS, next-generation sequencing.
The VC identified the only INDEL control variant (MYBPC3 c.671_673delTGC; Table 2). In addition, VC identified 10 INDELS but only c.53-11_53-7delCTTCTT in TNNT2 was found in the Sanger sequencing (Table S7). On inspection using IGV, the 9 false positives were false positives that mapped in only 1 strand or in homopolymer regions longer than 5 nucleotides.
Sensitivity and SpecificityTo extend the validation, pools 2–4 containing a total of 37 HCM patients underwent NGS followed by Sanger sequencing of the 9 genes. The main data of the Ion PGM runs are summarized in Table S4. In the 3 pools, we replicated the 4 sequencing failures (<20× coverage) previously observed in validation pool 1. The VC identified a total of 52, 56, and 49 SNV in the 3 pools, respectively (Tables S8–S10). We excluded the occurrence of false negative SNV on Sanger sequencing of the 9 genes in all patients used to create the pools. Moreover, we identified a total of 49 variants that were either mutations or polymorphisms seen in only 1 sample inside the corresponding pool (Table 3). Together, these data confirmed the accuracy of the method in avoiding false negatives. The VC also identified the 2 false-positive SNV previously found in validation pool 1. With regard to the INDELS, no false negatives and 3 false positives were found (Tables S11–S13).
Gene | Position | Exon/Intron | cDNA | Effect | Frequency | DNA pool |
---|---|---|---|---|---|---|
MYH7 | 23886055 | Intron 33 | c.4644+22G>A | None | 4.5 | 2 |
MYH7 | 23886064 | Intron 33 | c.4644+12_4644+13delTG | None | 4.8 | 1 |
MYH7 | 23886064 | Intron 33 | c.4644+12_4644+13delTG | None | 4.2 | 2 |
MYH7 | 23886155 | Exon 33 | c.4544T>C | p.T1522T | 2.6 | 2 |
MYH7 | 23886264 | Intron 32 | c.4520–63G>A | None | 3.4 | 1 |
MYH7 | 23886504 | Exon 32 | c.4377G>T | p.K1459N | 2.8 | 1 |
MYH7 | 23888371 | Intron 29 | c.3972+15C>T | None | 3.2 | 3 |
MYH7 | 23892799 | Exon 24 | c.3062C>A | p.T1019N | 4.0 | 2 |
MYH7 | 23892950 | Intron 23 | c.2923–18G>A | None | 4.1 | 3 |
MYH7 | 23893034 | Intron 23 | c.2922+82C>T | None | 4.2 | 3 |
MYH7 | 23893995 | Exon 22 | c.2662C>A | p.Q888K | 3.3 | 3 |
MYH7 | 23897077 | Exon 16 | c.1605A>G | p.E535E | 4.1 | 3 |
MYH7 | 23898994 | Exon 12 | c.1128C>T | p.D376D | 3.9 | 1 |
MYH7 | 23898994 | Exon 12 | c.1128C>T | p.D376D | 4.3 | 2 |
MYH7 | 23898994 | Exon 12 | c.1128C>T | p.D376D | 4.7 | 3 |
MYH7 | 23899027 | Exon 12 | c.1095G>A | p.K365K | 2.9 | 1 |
MYH7 | 23899038 | Exon 12 | c.1084G>A | p.M362V | 2.8 | 1 |
MYH7 | 23899793 | Exon 11 | c.975C>T | p.D325D | 4.5 | 2 |
MYH7 | 23900093 | Intron 10 | c.895+17G>A | None | 4.5 | 1 |
MYH7 | 23901012 | Exon 7 | c.597A>G | p.A199A | 4.6 | 1 |
MYH7 | 23901012 | Exon 7 | c.597A>G | p.A199A | 3.6 | 2 |
MYH7 | 23901922 | Exon 5 | c.428G>A | p.R143Q† | 4.0 | 2 |
MYBPC3 | 47356615 | Exon 26 | c.2883G>A | p.P961P | 4.5 | 3 |
MYBPC3 | 47357416 | Intron 25 | c.2737+12C>T | None | 5.7 | 2 |
MYBPC3 | 47358997 | Exon 24 | c.2547C>T | p.V849V | 3.3 | 2 |
MYBPC3 | 47359014 | Exon 24 | c.2531_2532 insGA | p.M844fs† | 5.5 | 1 |
MYBPC3 | 47359014 | Exon 24 | c.2531_2532 insGA | p.M844fs† | 5.3 | 3 |
MYBPC3 | 47360053 | Intron 22 | c.2308+18C>G | None | 3.4 | 1 |
MYBPC3 | 47360053 | Intron 22 | c.2308+18C>G | None | 4.9 | 2 |
MYBPC3 | 47360133 | Exon 22 | c.2246G>A | p.Y749C | 5.5 | 1 |
MYBPC3 | 47364138 | Exon 18 | c.1615A>G | p.I539V | 3 | 1 |
MYBPC3 | 47362642 | Intron 18 | c.1847+47G>A | None | 2.8 | 2 |
MYBPC3 | 47362642 | Intron 18 | c.1847+47G>A | None | 5.2 | 3 |
MYBPC3 | 47364129 | Exon 16 | c.1624G>C | p.E542Q | 4.4 | 3 |
MYBPC3 | 47364248 | Exon 16 | c.1505G>A | p.R502Q | 2.6 | 3 |
MYBPC3 | 47364975 | Intron 13 | c.1223+68C>T | None | 2.8 | 3 |
MYBPC3 | 47367823 | Exon 12 | c.1025T>A | p.V342D | 2.6 | 1 |
MYBPC3 | 47369443 | Exon 7 | c.786C>T | p.T262T | 3.6 | 3 |
MYBPC3 | 47370037 | Exon 6 | c.710A>G | p.Y237C† | 3.6 | 1 |
MYBPC3 | 47370041 | Exon 6 | c.706A>G | p.S236G | 3.3 | 3 |
MYBPC3 | 47370074 | Exon 6 | c.671_673delTGC | p.L224fs† | 4.4 | 2 |
MYBPC3 | 47370074 | Exon 6 | c.671_673delTGC | p.L224fs† | 4.6 | 3 |
MYBPC3 | 47371598 | Exon 4 | c.472G>A | p.V158M | 5.7 | 1 |
MYBPC3 | 47371598 | Exon 4 | c.472G>A | p.V158M | 4.2 | 3 |
TNNT2 | 201328272 | Exon 16 | c.*66G>A | None | 3 | 2 |
TNNT2 | 201330429 | Exon 14 | c.758A>G | p.K260R | 4.5 | 2 |
TNNI3 | 55665410 | Exon 6 | c.537G>A | p.E179E | 4.8 | 3 |
TNNI3 | 55668397 | Intron 2 | c.180+21G>A | None | 4 | 3 |
MYL2 | 111351974 | Intron 4 | c.274+16_274+17insCT | 2.2 | None | 3 |
MYL2 | 111353556 | Exon 3 | c.132T>C | p.I46I | 3.1 | 2 |
MYL3 | 46901019 | Exon 4 | c.427G>A | p.E143K | 3.7 | 1 |
MYL3 | 46902491 | Intron 1 | c.130–14G>T | None | 3.7 | 1 |
TPM1 | 63335074 | Exon 1 | c.46G>C | p.E16Q | 3.4 | 3 |
TPM1 | 63353451 | Exon 6 | c.689+313A>G | p.A216A | 2.7 | 1 |
TPM1 | 63356237 | Intron 8 | c.898+1393C>T | None | 5.5 | 1 |
TPM1 | 63358033 | Intron 9 | c.898+3189delT | None | 5.2 | 1 |
TPM1 | 63356331 | Exon 9 | c.841A>G | p.M281V | 3 | 2 |
ACTC1 | 35083251 | Intron 6 | c.990+64C>T | None | 3.8 | 2 |
†Known control variants. NGS, next-generation sequencing.
A total of 60 patients (pools 1–4) underwent both NGS and Sanger sequencing, with 100% sensitivity of the NGS (no false-negative variants). With regard to the specificity, there were only 2 false-positive SNV (MYH7 c.T136>C and c.T5345>A). As expected, the number of false positives was higher for the INDELS: a total of 20 fulfilled the criteria for being deconvoluted and only 4 were false positives (80% specificity).
PGM of the Discovery PoolsAfter determining the accuracy to detect rare variants in the validation pools, we sequenced a total of 86 patients in 4 discovery pools (A–D; 20–25 samples per pool). In addition to patients not or partly Sanger sequenced (and negative for mutations; n=76), each pool also contained the DNA from 2–4 patients with known mutations. The main data of the Ion PGM runs are summarized in Table S4. We replicated the 4 NGS failures (<20× coverage) previously observed in the validation pools. Thus, exons 1 of TPM1, 12 of MYBPC3, and 6 of ACTC1 should be Sanger sequenced in all patients as part of mutation screening.
At a 1% threshold the VC identified all the known control mutations in the 4 pools (Table S14; an excel file with all the variants is available upon request from the corresponding author). We also confirmed and assigned to specific patients through Sanger sequencing of the corresponding exon a total of 19 rare nucleotide changes that were classified as probably damaging or variants of uncertain effect (Figure 3; Table 4; Table S15). We found at least 1 putative mutation in 17 of the 76 patients. Two patients were carriers of 2 different variants (MYBPC3 p.A261T+p.E218K, and MYH7 p.L620P+p.K1459N). We also excluded the presence of additional patients with any of the control variants by sequencing the corresponding exon from all the patients in each pool. The VC also reported the 2 false MYH7 SNV in the 4 discovery pools.
Integrative Genome Viewer and Sanger electropherogram of variants found in the discovery pools, including a G insertion (p.V931fs) in MYBPC3, and nucleotide changes (p.E143Q and p.Y749C) in MYL3 and MYBPC3.
Gene | Nucleotide position |
Exon/Intron | cDNA | Effect | ESP frequency |
HGMD | HCM history |
SIFT | Polyphen |
---|---|---|---|---|---|---|---|---|---|
MYH7 | 23886504 | Exon 32 | c.4377G>T | p.K1459N | 1/4300 | Yes | No | Damaging | Probably damaging |
MYH7 | 23895023 | Exon 20 | c.2167C>G | p.R723C | No | Yes | No | Damaging | Probably damaging |
MYH7 | 23896042 | Exon 18 | c.1988G>A | p.R663H | 1/4300 | Yes | Yes | Damaging | Possibly damaging |
MYH7 | 23896823 | Exon 16 | c.1859T>C | p.L620P† | No | No | No | Damaging | Probably damaging |
MYH7 | 23902931 | Exon 3 | c.11C>T | p.S4L | No | Yes | No | Tolerated | Benign |
MYBPC3 | 47355475 | Exon 27 | c.2992C>G | p.Q998E | No | Yes | Yes | Damaging | Probably damaging |
MYBPC3 | 47359046 | Exon 24 | c.2498C>T | p.A833V | 2/4265 | Yes | No | Tolerated | Probably damaging |
MYBPC3 | 47360133 | Exon 22 | c.2246G>A | p.Y749C† | No | No | No | Damaging | Probably damaging |
MYBPC3 | 47367816 | Exon 12 | c.1032C>A | p.D344E† | No | No | No | Damaging | Possibly damaging |
MYBPC3 | 47369442 | Exon 7 | c.787G>T | p.G263X† | No | No | Yes | – | – |
MYBPC3 | 47369975 | Exon 6 | c.772G>A | p.E258K | No | Yes | Yes | Damaging | Possibly damaging |
MYBPC3 | 47371333 | Exon 5 | c.646G>A | p.A216T | 1/4204 | Yes | Yes | Tolerated | Benign |
MYBPC3 | 47371414 | Exon 5 | c.565G>A | p.V189I | 29/4208 | No | No | Tolerated | Benign |
MYBPC3 | 47371619 | Exon 4 | c.451G>A | p.D151N† | No | No | No | Tolerated | Benign |
MYBPC3 | 47373032 | Exon 2 | c.50G>A | p.R17Q | 1/4204 | No | No | Tolerated | Possibly damaging |
TNNT2 | 201328348 | Exon 16 | c.848G>A | p.R283H | 1/4299 | Yes | Yes | Damaging | Probably damaging |
TNNT2 | 201328373 | Exon 16 | c.823C>T | p.R275C | 5/4299 | Yes | No | Damaging | Probably damaging |
TNNI3 | 55665463 | Exon 6 | c.484C>T | p.R162W | 1/4300 | Yes | Yes | Damaging | Probably damaging |
ACTC1 | 35085599 | Exon 3 | c.301G>A | p.E101>K | No | Yes | No | Damaging | Probably damaging |
†Previously non-reported at Ensembl (www.ensembl.org, Release 74). HCM, hypertrophic cardiomyopathy; HGMD, human gene mutation database.
The 2 INDEL control variants in the discovery pools were successfully detected. In addition, a total of 13 putative INDELS with possible functional effect were identified. After applying the analysis parameters we concluded that none of the 13 putative INDELS fulfilled the quality criteria to be considered true: they mapped in only 1 strand, were present at a high frequency in the 2 arrays, and none of them was reported in the gene variation databases. Moreover, to validate the procedure we Sanger sequenced the patients in the corresponding pools and confirmed that all were false positives.
Because the 4 discovery pools were composed of 20–25 patients while the validation pools contained 12-16 different DNA, we Sanger sequenced the 9 genes in all the patients from the largest discovery pool to confirm that there were no false negatives when a larger number of patients was included in a pool. Pool A contained (in addition to 2 control samples) DNA from 23 patients who had been partly sequenced for the MYH7, MYBPC3, TNNT2, TPM1, and TNNI3. After completing the Sanger sequencing of the 9 genes in these cases, we confirmed the absence of SNV not identified in the NGS and the only 2 false-positive MYH7 changes (Table S16). In reference to the INDELS, we also confirmed the absence of false negatives, and only the TNNT2 c.136-49_136-48insA (previous identified in other pools) false positive was identified (Table S17).
The Ion Torrent PGM is a semiconductor (instead of optical) sequencer.15–17 The reported PGM procedures are based on the PCR amplification of DNA from single patients followed by pooling and barcode labeling of each patient’s fragments and NGS sequencing.7,18,19 We developed a procedure to perform multiplex amplification custom-Ampliseq designated to amplify the main HCM genes in only 2 tubes. This procedure avoids the necessity of multiple amplifications per patient, but at the cost of poor of no amplification for some of the exons. Only 4 out of the 176 amplicons failed to amplify and give sequence reads. Although the corresponding exons should be Sanger sequenced from each patient as part of the mutation screening, we consider that this represents a minimum cost compared to the advantage of amplifying >98% of the amplicons in only 2 tubes (Figure 1). It could be argued that the multiplex amplification might be optimized by redesigning the primer pairs for the non-readable amplicons. The maximum read capacity of the PGM (and other NGS procedures), however, is currently limited to approximately 200 bp, and this results in practical constraints in designing PCR primers around the targeted exons, specially when they are embedded in GC-rich regions. This strategy of multiplex amplification followed by PGM sequencing has already been tested in a HCM cohort.20 Compared to the present study, however, all the patients were previously Sanger sequenced, the covered genes were not the same, and the authors amplified each patient’s DNA individually.
In addition to amplifying all the target exons in only 2 tubes, we also sequenced DNA pools. This approach would also reduce the cost of processing single individuals, a fact that would reduce the cost of screening large numbers of individuals. The amplification of single fragments from DNA pools has been used to re-sequence and discover rare variants linked to common diseases, as well to uncover mutations in mendelian disorders.21–23 Although DNA-pools have been successfully sequenced with other NGS platforms, some authors concluded that barcoding of individual samples before pooling (rather than a genomic DNA pooling strategy) is preferred to avoid false negatives.10 If a fragment was not amplified from a particular DNA in the pool, the patient should be wrongly classified as a non-mutation carrier. We think this was unlikely in the present study because we excluded the occurrence of false negatives in 60 patients who were both Sanger and NGS sequenced for the 9 genes. Also, all the pools used high-quality DNA previously assayed and quantified through Taqman assays. In addition, the power to detect rare mutations could be increased by sequencing overlapping population pools in which each individual occurs in 2 pools.24 One of the main limitations of the PGM (and other NGS platforms) is the presence of false positives, mainly in homopolymer regions >4 nucleotides. With the present VC criteria the specificity for INDELS was 80%, with a sensitivity of 100% in detecting all the true variants inside the pools.
Once validated, we searched for mutations in a total of 76 patients distributed in 4 pools. In addition to new cases, we also included patients who had been partially sequenced or studied through indirect techniques (such as single-strand conformation analysis). We successfully identified the control variants in these discovery pools, and also found a total of 19 rare nucleotide changes that were classified as probably damaging or variants of uncertain effect. All them were confirmed and assigned to 17 patients via Sanger sequencing of the corresponding exon. Two patients were double mutation carriers, a condition that has been linked to poor prognosis.25 In a total of 11 of these index cases we performed family studies and identified a total of 26 mutation carriers (data not shown).
The validation pools contained fewer samples than the discovery pools, and it might be argued that this could affect the results in terms of lower specificity and sensitivity when larger pools undergo NGS. We think this was unlikely in the present study, because the 9 genes were Sanger sequenced in all the 25 patients from the largest pool and we did not find false-negative variants: they were the same false positives as in the validation pools. Although we were able to identify unique nucleotide variants in a pool of 25 individuals (1/50 alleles), the lower level of detection was not identified. Thus, it is possible that rare unique variants could be read over the sequencing noise level in even larger pools.
An issue to consider when searching for rare variants in DNA pools is the number of Sanger sequences required to characterize the mutation carriers in the pool. Among the 4 discovery pools, the larger number of mutations (n=6) was found in pool D, containing 22 patients. Thus, a total of 132 PCR fragments were amplified and 264 Sanger reads (forward+reverse strands) generated. In comparison, a total of 128 amplicons+5,632 sequence reads would be required for the Sanger sequencing of the 9 genes in the 22 patients. In the case that the number of putative mutations equals the number of samples (N) in the pool and all the mutations are in different amplicons, a total of N2 single amplicons would need to be Sanger sequenced. In this way, to identify all the putative mutations would be impractical in cost and labor time for large DNA pools. In this study we created 8 libraries to sequence a total of 136 HCM patients for the price of only approximately €2,400 in library preparation. If we had performed this study using individual barcode samples, the cost of library preparation would have risen to €40,800, 17-fold greater. In addition, 2 technicians carried out the NGS and Sanger assignment of the putative mutations in the 76 patients from the discovery pools in only 4 weeks, a time much shorter than that required to sequence the 9 genes in all the patients on Sanger sequencing.
Finally, NGS technology has generated high-throughput sequencing data in genes linked to cardiomyopathies, in patients and in apparently healthy individuals.6,9 The Exome Sequencing project found rare variants (ESP database, http://evs.gs.washington.edu) previously identified in HCM patients, a fact that has questioned the pathogenicity of some missense nucleotide changes previously considered as mutations.26,27 At this stage, the present procedure would facilitate rapid and cost-effective sequencing of the main HCM-associated genes in large sets of individuals.
We report the massive parallel sequencing of the genes most commonly mutated in HCM patients. The present procedure would facilitate the rapid and cost-effective screening of these genes on a population scale.
This work was supported by a grant from Instituto de Salud Carlos III-Fondo Europeo de Desarrollo Regional (FIS-12/00287; RD12/0021/0012).
The authors declare no conflict of interest.
Authors Contribution: J.R.R., C.M., M.M., recruited the patients and performed the clinical and echographic studies; J.G., V.A., B.A., S.I., and E.C. performed the genetic studies; J.G. and E.C. wrote the ms. All the authors have seen and approved the final version of the ms.
Supplementary File 1
Table S1. Samples used to create the 8 pools
Table S2. Ampliseq amplicons and coverage details
Table S3. Ampliseq for the 9 genes
Table S4. Ion torrent PGM run characteristics (n=3)
Table S5. GC content in 4 unreadable amplicons
Table S6. Rare SNV in sanger to NGS validation pool 1 (n=13)
Table S7. INDELS in sanger to NGS validation pool 1
Table S8. Rare SNV in NGS to sanger validation pool 2
Table S9. Rare SNV in NGS to sanger validation pool 3
Table S10. Rare SNV in NGS to sanger validation pool 4
Table S11. INDELS variants in NGS to sanger validation pool 2
Table S12. INDELS variants in NGS to sanger validation pool 3
Table S13. INDELS variants in NGS to sanger validation pool 4
Table S14. Control variants in discovery pools
Table S15. Rare variants in discovery pools A–D
Table S16. Rare SNV in discovery pool A
Table S17. INDELS variants in discovery pool A
Figure S1. Examples of de Integrative Genome Viewer for falsepositive single-nucleotide variant and indels, with the corresponding Sanger sequence.
Please find supplementary file(s);
http://dx.doi.org/10.1253/circj.CJ-14-0628