Mass Spectrometric Characterization of Histone H3 Isolated from in-Vitro Reconstituted and Acetylated Nucleosome Core Particle

Post-translational modifications (PTMs) of histone N-terminal tails in nucleosome core particle (NCP), such as acetylation, play crucial roles in regulating gene expression. To unveil the regulation mechanism, atomic-level structural analysis of in-vitro modified NCP is effective with verifying the PTMs of histones. So far, identification of PTMs of NCP originating from living cells has mainly been performed using mass spectrometry (MS) techniques, such as bottom-up approach. The bottom-up approach is the most established method for protein characterization, but it does not always provide sufficient information on the acetylated sites of lysine residues in the histone tails if trypsin digestion is carried out. For histone proteins, which have many basic amino acids, trypsin generates too many short fragments that cannot be perfectly analyzed by tandem MS. In this study, we investigated the in vitro acetylation sites in the histone H3 tail using a top-down sequence analysis, matrix-assisted laser desorption/ionization in-source decay (MALDI-ISD) experiment, in combination with aminopeptidase digestion. Aminopeptidase can cleave peptide bonds one-by-one from the N-terminus of peptides or proteins, generating N-terminally truncated peptides and/or proteins. As a result, it was identified that this method enables sequence characterization of the entire region of the H3 tail. Also, application of this method to H3 in in-vitro acetylated NCP enabled assigning acetylation sites of H3. Thus, this method was found to be effective for obtaining information on in-vitro acetylation of NCP for structural biology study.


INTRODUCTION
In the eukaryotic nucleus, DNA is organized into packed chromatin, whose basic structural unit is nucleosome core particle (NCP). [1][2][3] e NCP is composed of an octamer of the four histones (H2A, H2B, H3, and H4) wrapped around ca. 147 base pairs (bp) of DNA. e N-terminal region of each histone, histone tail, contains high proportion of basic amino acids, and is unstructured. It has been demonstrated that post-translational modi cations (PTMs) of the histone tails play crucial roles in regulating gene expression. [4][5][6] Histone acetylation, one of the PTMs of NCP, triggers translational activation. 7) When the basic amino acids, such as lysine (Lys), in histone tails are acetylated, it is expected that electrostatic interactions between the basic histones and the negatively charged DNA are reduced, leading to loosening the NCP structure.
ere are various histone acetyltransferases (HATs), which catalyze the transfer of an acetyl moiety from acetyl coenzyme A (Acetyl-CoA) onto the ε-amino group of Lys; they regulate nucleosome assembly and the folding of chromatin by working at an appropriate situation. 7) Also, multiple modi cations work cooperatively and simultaneously for these regulations. erefore, in order to correctly understand the functional changes brought about by each modi cation, it is necessary to reproduce each modi cation in vitro and characterize the function and structure relationship with controlling the extent of modi cation of NCP. To determine the atomiclevel structure of modi ed NCP, X-ray crystallography has mainly been used, while mass spectrometry (MS) is an es-sential tool for identifying these PTMs. 8,9) ere are three MS-based approaches for analyzing protein sequence: bottom-up, top-down, and middle-down. 8,9) In the bottom-up approach, proteins are digested into peptides with trypsin, a protease mainly used, and the peptides are analyzed by MS and tandem MS (MS/MS). Using the information obtained by MS and MS/MS analyses, the sequence of each peptide is characterized. In the top-down approach, proteins are isolated and fragmented within the mass spectrometer to obtain sequence information, which can be analyzed by database searching. In contrast, the middle-down approach is the intermediate between the bottom-up and the top-down approaches; proteins are digested into peptides with a protease such as Glu-C, which usually generates polypeptides consisting of 30-40 amino acids. e most popular and established approach is the bottomup approach. However, the lengths of peptides obtained by tryptic digestion are o en too short to analyze by MS/MS, since histone tails are rich in Lys and arginine (Arg). In addition, it is di cult to completely characterize the global status of the PTM patterns of a particular protein, because the relationship between multiple modi cations cannot be identi ed from the sequence analysis of many short peptides if they contain just a single PTM. On the other hand, the other two approaches can reveal the relationship of co-existing modi cations, because relatively large peptides/ proteins can be analyzed. [10][11][12] To cleave large polypeptides and proteins, collision-induced dissociation (CID), the major fragmentation method equipped with most of the tandem mass spectrometers, is not appropriate and cannot provide enough sequence information for them when used in combination with these two approaches. For the topdown and the middle-down approaches, other fragmentation methods that cleave peptide bonds by radical, such as electron capture dissociation (ECD) 13) or electron transfer dissociation (ETD), 14) are e ective.
In the ECD and the ETD mechanism, electron attachment and transfer occur at positively charged sites, such as basic amino acids, and the π * antibonding orbital of peptide bond, producing fragment ions around these sites. 15) However, it is not easy to identify all fragment ions derived from a large protein. For sequence analysis of large peptides/proteins, another alternative is matrix-assisted laser desorption/ionization in-source decay (MALDI-ISD). To generate informative fragment ions for sequence analysis, the choice of matrix for MALDI-ISD experiment is very important. 16) Regardless of matrix selection, the fragment ions observed in the low m/z region mainly provide sequence information from the N-or C-terminus. erefore, the MALDI-ISD experiment is considered to be a promising tool for characterizing histone PTMs, as reported by Kwak and Dohmae. 17) Here, we characterized the sequence of H3, which has the longest histone tail among four histones, with MALDI-ISD experiment combined with aminopeptidase digestion. Aminopeptidase can cleave peptide bonds from the N-terminus of peptides or proteins, resulting in N-terminally truncated peptides and/or proteins. By this method, we could successfully obtain the sequence information of a long tail region of H3. Furthermore, this method was applied to H3 in NCP with/without acetylation by HAT p300 in vitro. It enabled complete sequence charac-terization of the acetylated H3 tail region in reconstituted NCP in vitro, which would help to facilitate structural biology of epigenetic regulation.

Separation of histones from NCP by HPLC
For histone isolation from NCP, denaturing bu er (1.6 mM NaCl, 2.1 M guanidine HCl, and 0.5% tri uoroacetic acid (TFA)) was rst added to NCP. Denatured NCP was then applied to a C18 high performance liquid chromatography (HPLC) trap column (2.1×10 mm, Waters, Milford, MA) at a ow rate of 0.4 mL/min. Each histone was separated with a C8 reverse phase (RP) column (2.1×150 mm, 5 µm particle size; GL sciences, Japan), which was sequentially connected to the C18 trap column, at room temperature. A 100-µL sample was injected and protein peaks were detected at a UV wavelength of 215 nm. e mobile phases A (5% acetonitrile with 0.1% TFA) and B (100% acetonitrile with 0.1% TFA) were delivered at a ow rate of 0.4 mL/min using the following gradient program: 0 to 5 min, 0% solvent B; 5 to 10 min, 0-37% solvent B; 10 to 15 min, 37% solvent B; 15 to 30 min, 37-42.30% solvent B; 30 to 40 min, 42.30-49.50% solvent B; 40 to 50 min, 49.50-90% solvent B; 50 to 60 min, 90% solvent B; 60 to 65 min, 90-0% solvent B. e fraction including each histone was collected manually. Each fraction was concentrated under vacuum before digestion or MALDI-ISD experiments. e fractions were UV-measured by nanodrop 1000 ( ermo Fisher Scienti c, Boston, MA), and their concentration was calculated. In the case of digestion, the solvent of each fraction was exchanged to MilliQ using ultra ltration.

LC-MS
For bottom-up approach, LC-MS measurement was per-formed using LTQ Orbitrap XL ( ermo Fisher Scienti c, Boston, MA) mass spectrometer equipped with a nano-LC in positive-ion mode. A 5 µL of H3 solution (100 µM, 500 pmol) in 50 mM ammonium bicarbonate was added to a 1 µL of trypsin solution (0.1 mg/mL) and kept at 37°C for overnight. A er the digestion, the tryptic peptides were desalted and concentrated with Zip Tip C18 (Merck, Germany), and then evaporated. e dried tryptic peptides were added to a 10 µL of 0.1% TFA. e solution of tryptic peptides was applied to a C18 trap column (300 µm I.D.×5 mm, ermo Fisher Scienti c, Boston, MA) at a ow rate of 200 nL/min. Each tryptic peptide was separated with a C18 RP column (75 µm I.D.×120 mm, 3 µm particle size; Nikkyo Technos, Co., Ltd., Japan), which was sequentially connected to the C18 trap column, at room temperature. e mobile phases A (0.1% formic acid (FA)) and B (100% acetonitrile with 0.1% FA) were delivered at a ow rate of 200 nL/min using the following gradient program: 0 to 3
e laser power was optimized in order to obtain MALDI-ISD spectra with high signal-to-noise ratios (S/N) for the ISD ions.
For MALDI-TOF experiment, sinapinic acid (SA) (10 mg/mL) was dissolved in water/acetonitrile ( Figure 1 shows the amino acid sequence of human histone H3 used in the present study. Gray characters (36 residues from alanine1 (Ala1)) in Fig. 1 indicate the tail region suggested by the X-ray crystallography analysis of reconstituted NCP because this region was invisible in the structure. 26) Here we de ned this region as the histone H3 tail. In the present study, recombinant H3 with a histidine tag was expressed in Escherichia coli. A er removal of the histidine tag by proteolysis with thrombin, puri ed H3 retained the linker sequence of GSM, which connected the histidine tag and the H3 sequence, at the N-terminus of the protein. First, the sequence of H3 was analyzed by the bottom-up approach. As indicated in Fig. 1, the sequence information could be obtained for the region of 18-128th residues except for 70th-72nd residues. Although acetylation or methylation of Lys4, Lys9, and Lys14 is responsible for regulation of transcription initiation, the sequence information for 20 amino acids from the N-terminus could not be obtained. 6) To obtain the sequence information of this region, we rst examined MALDI-ISD of recombinant H3, as shown in Figs. 2a and S-1. DAN, a reducing matrix, was used for this experiment to observe c or z+2 ions similar to ECD or ETD. [27][28][29] Observed peaks were assigned to c ions, corresponding to the fragments from the N-terminus to glycine44 (Gly44) of H3, demonstrating that the complete sequence information for the H3 tail region can be obtained by the MALDI-ISD experiments.

RESULTS AND DISCUSSION Sequencing H3 tail using bottom-up and topdown approaches
Considering that H3 exists as a component of NCP and undergoes various PTMs in living cells, it is necessary to establish a method that enables sequence characterization of H3 isolated from NCP. Consequently, H3 was isolated from the reconstituted NCP by HPLC and then similarly analyzed by MALDI-ISD. About 160 pmol of each histone was recovered from 100 pmol of the injected NCP by the HPLC system. e recovery yield was estimated to be approximately 80% because two molecules of each histone protein were contained in each NCP. Figure S-2 shows MALDI-ISD mass spectrum of isolated H3 by HPLC. e observed peaks were identi ed as c 4 -c 17 ions. Histone H3 is highly basic due to many Arg and Lys residues especially in the tail region. Within NCP, H3 makes direct contacts with negatively charged DNA. To dissociate histones from DNA, NaCl and an unfolding reagent, such as guanidine HCl, were added to the NCP sample and then desalted using the Gray bold characters indicate residues that could not be modeled in the crystal structure of the human NCP (PDB I.D. 3AFA). 26) Blue characters indicate the basic amino acids in H3. Blue arrows indicate the tryptic peptides obtained by the bottom-up approach. e linker sequence of "GSM" is also indicated at the N-terminus of the H3 sequence. trap column. However, even desalted with the trap column, basic residues in histones would have retained chloride ions due to their high basicity. ISD requires a positive charge on the analyte protein to start the reaction. When basic amino acids form salts with chloride ions, it is di cult to obtain sequence information of the protein by ISD due to loss of basicity. e S/N of the fragment ions are reduced for the protein associated with salts. us, isolated H3 from the reconstituted NCP presented poor S/N of the fragment signals in ISD spectrum. erefore, it would have been difcult to analyze the whole tail regions of H3 in NCP only by MALDI-ISD.

Sequencing H3 tail using MALDI-ISD in combination with aminopeptidase digestion
To obtain the sequence information of H3 tails in NCP in detail, we applied MALDI-ISD in combination with aminopeptidase digestion. Aminopeptidase cleaves peptide bonds one-by-one from the N-terminus of peptides or proteins, resulting in N-terminally truncated peptides and/or proteins. In the case of histone H3, it is expected that the sequence information of the region that cannot be read out with a simple experiment only by MALDI-ISD can be obtained by the combination of aminopeptidase digestion with MALDI-ISD-MS. Figure S-3 shows MALDI mass spectra of aminopeptidase digests of H3. e observed m/z value of 13808 was ca. 1730 smaller than the original m/z value in Fig. S-3a. It was also found that aminopeptidase digestion for 6 h was enough to remove the N-terminal 17 residues from the histone H3 protein, as shown in Fig. S-3c. e decreased mass value was consistent with the characteristics of aminopeptidase, which does not cleave the peptide bond connecting to the proline imino group. e N-terminal amino acid of the newly generated polypeptide is Ala15 of the H3 sequence (Fig. 1). Consequently, we subjected this N-terminally truncated H3 to the MALDI-ISD experiment. As shown in Fig. 2b, it was possible to obtain the sequence information from Ala21 to isoleucine51 (Ile51) residues of H3, which could not be analyzed from the ISD mass spec-trum of intact H3 prepared form reconstituted NCP. In Fig.  2b, the baseline of the ISD spectrum was a little noisy. is might be due to the fragmentation of the peptides originated from non-speci c cleavage of aminopeptidase.

Identi cation of H3 in acetylated NCP
Next, in order to examine whether the modi ed sites of H3 can be identi ed by the above method, we performed acetylation of histone proteins in reconstituted NCP, and then investigated the acetylated sites in H3. Figure 3 shows MALDI mass spectra of acetylated H3 in NCP with/without aminopeptidase digestion. In the mass spectrum of H3 in Fig. 3a, doubly charged peaks observed around m/z 7700 suggest that H3 was from zero to tetra acetylated. For the N-terminally truncated H3, peaks around m/z 6900 were assigned to from zero-to tri-acetylated H3, as shown in Fig.  3b. In addition, the acetylation level in the N-terminally truncated H3 was one less, suggesting that there is mono-acetylated Lys from N-terminus to Lys14. Next, MALDI-ISD measurement was performed on these acetylated H3 with/without aminopeptidase digestion, as shown in Fig. 4. In mass spectrum of acetylated H3, as shown in Fig. 4a, observed peaks were identi ed as c 5 -c 18 ions. Among observed c ions, c 14 -c 17 ions were zero-and monoacetylated. Also, c 18 ion was zero, mono and di-acetylated, as shown in Fig. 4a inset. Consequently, it was estimated that Lys14 and Lys18 were partly acetylated, while Lys4 and Lys9 were not modi ed at all. In the mass spectrum of N-terminally truncated H3 (Fig. 4b), observed peaks were assigned to c 21 -c 31 ions, showing partly acetylation at Lys18, Lys23 and Lys27. Schiltz reported using microsequence analysis that p300 acetylates Lys14, Lys18 and Lys23 in H3 in vitro. 30) Also, the extent of acetylation of Lys23 was lower than that of Lys14 and Lys18. 30) e di erence of acetylation sites and levels between the previous and our studies might have been caused by the di erence in the interaction sites of histone and DNA within reconstituted NCP. In the present study, unmodi ed NCP was rst prepared and then it was acetylated with HAT p300. If DNA would not have tightly e mass spectra were obtained by AXIMA-CFR. Gray bold characters indicate residues that could not be modeled in the crystal structure of the human NCP (PDB I.D. 3AFA). 26) e numbers used for annotation of c ions start from Ala1 in the original H3 sequence shown in Fig. 1. Figure 2a shows the expanded spectrum for m/z 700-4000 while Fig. S-1 shows the same spectrum in the range of m/z 700-5000. and uniformly wrapped around the histone octamer in the reconstituted NCP, the positive charges of basic amino acids in histones might interact with negative charges at di erent positions of DNA strand, resulting in the di erent acetylation sites and levels of H3.
As mentioned above, it was demonstrated that the method combining aminopeptidase digestion and MALDI-ISD is useful for identifying the modi cation sites of the long tail region of H3. Furthermore, our results demonstrate the importance of veri cation of the modi ed sites for each experiment, even the site speci city has already been well characterized.
us, this method should be e ective for characterization of a variety of in-vitro modi ed NCP prepared for structural biology study.

CONCLUSION
In this study, we characterized the sequence of H3 with MALDI-ISD experiment combined with aminopeptidase digestion. Although H3 is the longest histone with the longest tail region among four histone proteins that consist NCP, it was possible to analyze the entire tail region of H3. We also applied this method to H3 originated from acetylated NCP, and were able to identify the in-vitro acetylated sites of H3. Considering that there are many histone variants, such as CENP-A, and important PTMs in H3 related to function of chromatin, this method should be e ective to characterize the relationship between function and PTMs of the tail regions of histone H3. Furthermore, we believe that this method would help understanding the transcription activation mechanism by various modi cations, which result in structural changes in NCP.  e mass spectra were obtained by Bi ex IV. e inset panel in Fig. 4a shows expanded mass spectrum in the range of m/z 2000-2350. e sequence above each mass spectrum shows the fragmentation sites of H3. Blue characters indicate Lys residues in each sequence. e numbers above these sequences indicate the positions of partly acetylated Lys residues. Annotations in blue show the number of acetyl group included in each c ion shown in red.