Biological and Pharmaceutical Bulletin
Online ISSN : 1347-5215
Print ISSN : 0918-6158
ISSN-L : 0918-6158
Regular Articles
Next-Generation Sequencing of Protein-Coding and Long Non-protein-Coding RNAs in Two Types of Exosomes Derived from Human Whole Saliva
Yuko OgawaMasafumi TsujimotoRyohei Yanoshita
Author information
Supplementary material

2016 Volume 39 Issue 9 Pages 1496-1507


Exosomes are small extracellular vesicles containing microRNAs and mRNAs that are produced by various types of cells. We previously used ultrafiltration and size-exclusion chromatography to isolate two types of human salivary exosomes (exosomes I, II) that are different in size and proteomes. We showed that salivary exosomes contain large repertoires of small RNAs. However, precise information regarding long RNAs in salivary exosomes has not been fully determined. In this study, we investigated the compositions of protein-coding RNAs (pcRNAs) and long non-protein-coding RNAs (lncRNAs) of exosome I, exosome II and whole saliva (WS) by next-generation sequencing technology. Although 11% of all RNAs were commonly detected among the three samples, the compositions of reads mapping to known RNAs were similar. The most abundant pcRNA is ribosomal RNA protein, and pcRNAs of some salivary proteins such as S100 calcium-binding protein A8 (protein S100-A8) were present in salivary exosomes. Interestingly, lncRNAs of pseudogenes (presumably, processed pseudogenes) were abundant in exosome I, exosome II and WS. Translationally controlled tumor protein gene, which plays an important role in cell proliferation, cell death and immune responses, was highly expressed as pcRNA and pseudogenes in salivary exosomes. Our results show that salivary exosomes contain various types of RNAs such as pseudogenes and small RNAs, and may mediate intercellular communication by transferring these RNAs to target cells as gene expression regulators.

Human whole saliva (WS) contains an aqueous complex mixture of proteins, peptides, hormones, metabolites, DNAs and RNAs. WS contributes to maintain the integrity of the oral cavity through its lubricating, antibacterial, antiviral and buffering actions, and facilitates chewing and swallowing food. It plays an important role in front-line body defense. Because WS can be collected simply, cheaply and noninvasively, it has been used to monitor human health and disease. Because saliva contains ribonucleases and nucleases from various sources, which influence RNA stability, relatively few studies have analyzed human salivary RNA.1) Salivary nucleotides are protected from degradation by inclusion in extracellular vesicles such as exosomes. Recently, RNAs and DNAs in WS have been highlighted in the fields of biomarker research, disease diagnostics and forensic study.24)

Exosomes are small (30–100 nm) membrane vesicles of endocytic origin that are released into the extracellular environment upon fusion of multivesicular bodies with the plasma membrane. Exosomes are present in various body fluids including blood, breast milk, malignant ascites, urine, amniotic fluid and saliva.5) Exosomes can contain the proteins and nucleic acids of their cell of origin and can transfer their contents to recipient cells at a distance.5) Exosomes are now thought to be secreted by various cell types, and numerous components of exosomal proteins, RNAs and lipids are registered in Vesiclepedia, the database of extracellular vesicles.6) Previous studies have shown that exosomes also contain protein-coding RNAs (pcRNAs), usually referred to as mRNAs, and small non-coding RNAs (sncRNAs) called microRNAs (miRNAs). These RNA molecules can be transferred to other cells and are functional in the new environment.7) Because exosomes can transfer their information from secreted cells to other cells, they are attracting attention in the study of cancer metastasis, immune reaction and biomarker research.810)

Although more than 90% of the human genome is transcribed into RNA, only about 2% is translated into proteins.11) Non-coding RNAs (ncRNAs) do not encode proteins but function directly at the level of the RNA in the cell. NcRNAs are generally classified by size: sncRNAs are less than 200 bases, and long ncRNAs (lncRNAs) are greater than 200 bases.11) miRNAs are a class of small (17–25 nucleotides) single-stranded sncRNAs that control gene expression in animals, plants, and unicellular eukaryotes. In addition to miRNAs, the following regulatory ncRNA gene types are also annotated in the Ensemble database: transfer RNAs (tRNAs), transfer RNAs located in the mitochondrial genome (Mt-tRNAs), ribosomal RNAs (rRNAs), piwi-interacting RNAs (piRNAs), small cytoplasmic RNAs (scRNAs), small nuclear RNAs (snRNAs), small nucleolar RNAs (snoRNAs), miscellaneous other RNAs (misc_RNAs), and long intergenic non-coding RNAs (lincRNAs), along with pseudogenes12) (Ensemble Asia website; However, the function of each ncRNA is yet to be fully elucidated.

We previously isolated two types of extracellular vesicles in human WS by ultrafiltration and gel-exclusion column chromatography.13,14) Although our method has not been universally recognized for purifying exosomes, Wyss et al. recently showed that ultrafiltration and size-exclusion are valid methodologies for intact exosome purification.15) Because exosomal marker proteins (Alix, tsg101, hsp70 and CD63) were detected in both samples by Western blot analysis, we designated the vesicles as exosomes I and II. Exosome I was derived from the void fraction from the column. Exosome II was derived from the second small protein peak with high activity of dipeptidyl peptidase-4 (DPP4). The mean diameter of exosome I was 83.5 nm and that of exosome II was 40.5 nm as calibrated by transmission electron microscopy. We also performed proteome and small RNA transcriptome analyses14,16) and the results showed that the protein and sncRNA components of exosomes I and II did not completely coincide with each other. The reads mapping to sncRNAs by next-generation sequencing (NGS) showed that sncRNAs of rRNAs, piRNAs, snoRNAs, short interspersed nuclear elements (SINEs), long interspersed nuclear elements (LINEs) and reads defined as ‘repeats’ included long terminal repeats (LTRs) were abundant in the two types of exosomes and WS, in addition to miRNAs.

In recent years, many studies have performed NGS analysis of exosomal RNA.1719) However, most of these analyses were targeted to known RNAs, such as mRNAs and miRNAs, and studies on lncRNAs in exosomes remain limited.20) In this study, we investigated long RNA components (pcRNA and lncRNA) of exosomes I, II and WS by NGS using total RNA samples from our previous study of the small RNA transcriptome.16)


Isolation of Total RNA from Human WS

Ethical approval was obtained from the institutional review board of Teikyo Heisei University (approval number 26-088). Isolation of total RNA from WS was performed as described previously.16) Briefly, human WS was collected from a single healthy female volunteer of Japanese origin (38 years old) from our laboratory, and written informed consent for this specific study was obtained. Total RNA was isolated from 1 mL of WS using the RNeasy Protect Saliva Mini Kit (Qiagen, Valencia, CA, U.S.A.), according to the manufacturer’s instructions. The quantity and quality of total RNA was assessed using a bioanalyzer (Agilent 2100; Agilent Technologies, Palo Alto, CA, U.S.A.).

Salivary Exosome Isolation and RNA Isolation

Exosomes were purified from WS as previously described.14) Briefly, 30 mL of WS was added to an equal volume of Tris-buffered saline (20 mM Tris–HCl, pH 7.4, 150 mM NaCl). The cell debris and bacteria in oral cavity were pelleted and removed by centrifuging the WS sample at 8000×g for 5 min at room temperature. A part of the supernatant was used for RNA isolation from WS. The supernatant was filtered through a 5 µm cellulose acetate filter and ultrafiltered using an Amicon Ultra-15 centrifugal filter device with a 100 kDa exclusion (Millipore Corporation, MA, U.S.A.). The concentrated filtrate was subjected to gel filtration on a Sephacryl S-500 column (GE Healthcare, Buckinghamshire, U.K.) equilibrated with Tris-buffered saline. Void fractions (exosome I) and the subsequent fractions displaying high DPP4 activity (exosome II) were collected and ultrafiltered using Amicon Ultra-4 with a 100 kDa exclusion. Purification was performed seven times from independently collected WS samples. Purified exosome samples were pooled and used for RNA isolation. Total RNA was isolated using the RNeasy Protect Saliva Mini Kit (Qiagen), according to the manufacturer’s instructions. Extracted RNA was concentrated, and the quantity and quality of total RNA were assessed using an Agilent 2100 Bioanalyzer. The total RNA concentrations of exosomes I, II and WS were 0.21, 0.10 and 4.2 ng/mL of saliva, respectively.16)

RNA Library Construction and Sequencing

Isolated RNA samples from exosomes I, II and WS were amplified using the RampUP RNA Amplification Kit (Genisphere, PA, U.S.A.) according to the manufacturer’s protocol. Total RNA (2 ng) from exosomes I, II or WS was used for amplification. The amplification method was as follows: first, cDNA sequences were generated by reverse transcription from all RNA molecules using reverse transcriptase, dT primer and/or random primers. Next, a poly(A) tail was added using 2′-deoxyadenosine 5′-triphosphate (dATP) and terminal deoxynucleotidyl transferase. The T7/T3 template Oligo was annealed to the 3′ end of the cDNA. Klenow enzyme filled in the 3′ end of first strand cDNA to produce a double-stranded T7/T3 promoter. In vitro transcription with T7 RNA polymerase was then performed and sense RNA copies of the original RNA molecules were generated. The sense RNA was subjected to reverse transcription to produce second cDNA. The sense RNA was then degraded by RNase H. The second cDNA was annealed with T3 template Oligo and in vitro transcription with T3 RNA polymerase was performed to generate sense RNA.

Sequencing samples were generated using the amplified sense RNA, according to the manufacturer’s instructions (Illumina Inc., San Diego, CA, U.S.A.). Briefly, first-strand cDNA was synthesized by reverse transcription, and second-strand cDNA was generated using the first-strand cDNA. After treatment by Klenow DNA polymerase, a poly(A) tail was added, and the adaptor oligo sequence was ligated. The cDNA product was size-fractionated by agarose gel electrophoresis and the cDNA fraction in the range of 200–400 nucleotides was extracted. The resulting cDNA was amplified and sequenced for 51 cycles, single-end, on the Illumina Genome Analyzer IIx (Illumina) by Hokkaido System Science Co., Ltd. (Sapporo, Japan).

RNA Genome Mapping

Read sequences containing primer/adaptor sequences were discarded as follows: if the estimated quality value (QV) was lower than 10, the bases were trimmed and adaptor sequences were removed using Cutadapt version 1.1 ( We then filtered out the trimmed reads with lengths shorter than 36 bases.

The read sequences were mapped to the human Ensembl GRCh37.p6 genome assembly using TopHat ( TopHat first maps non-junction reads (those contained within exons) using Bowtie ( The reads were mapped to the human genomic sequence, with at most two mismatches or 2 bases of indels allowed. The reads that did not map were used to find splice junctions without a reference annotation by TopHat with 25 bases segments. The 25 bases reads allow up to two mismatches.

The mapped sequences were quantitated by expressional intensity, RPKM (reads per kilobase of exon per million mapped reads) using Cufflinks version 1.3.0 ( RPKM values of ≥10 were used for further analysis.

Pseudogenes were surveyed using the database ( The reads that predicted ‘pseudogene’ by ensemble annotation were collected and each gene’s position on a chromosome was referred to on The reads with start and end positions of genes within 100 bases between the two databases were designated as ‘nearest pseudogenes.’ To count the nearest pseudogene, the redundancy of the parental genes of the pseudogenes was removed because a single pcRNA often has more than one related pseudogene.

Gene ontology (GO) analysis was performed using CateGOrizer ( GO analysis was performed for biological process and molecular function categories with multiple occurrences. The results with unknown annotations were excluded from the counting.


Sequencing and Annotation of RNAs from Salivary Exosomes and Whole Saliva (WS)

Bioanalyzer profiles of the total RNA isolated from exosomes I, II and WS were previously described.16) A broad range of RNA sizes were detected in all three samples. rRNA (18s, 28s rRNA) peaks were not as high in our samples compared with those of urinary exosomes.20) Figure 1 shows an overview of the analysis of pcRNAs and lncRNAs in the two types of salivary exosomes and WS. Total RNA yields from exosomes I and II starting from 210 mL of WS were 44 and 21 ng, respectively. The total RNA yield from 1 mL of WS was 4.2 ng. Because the levels of RNAs in the three samples were too low to analyze by NGS, 2 ng of total RNAs of the salivary exosomes and WS were amplified using random primer and dT primer. Bioanalyzer profiles of the amplified RNA showed that the RNA fragment peak was slightly lower than 200 nucleotides (Supplementary Fig. S1). Because the Illumina Genome Analyzer IIx can apply a maximum of 400 nucleotides, RNAs between 200–400 nucleotides in size were used to construct the cDNA libraries of large RNA. The amounts of amplified sense RNA obtained from exosomes I, II and WS were 944.4, 17578.8 and 1053.6 ng, respectively.

Fig. 1. Overview of the Analysis of Protein-Coding RNAs and Long Non-protein Coding RNAs in Two Types of Salivary Exosomes (Exosomes I, II) and Human Whole Saliva (WS)

We performed NGS on the RNAs of exosomes I, II and WS using Illumina high-throughput RNA sequencing technology. The RNAs amplified from each sample group were reverse transcribed and sequenced. After removing low-quality regions, adaptors and all possible contaminations, we obtained a total of 9935476, 7684370 and 6283177 sequence reads from exosomes I, II and WS, respectively (Table 1, DNA Data Bank of Japan Accession No. DRA003516). The reads generated in this study were subjected to cluster and assembly analyses using TopHat and Cufflinks.

Table 1. Numbers of Mapped Reads
SampleTotal filtered readsUniqueMultiple%Unmapped%
Exosome I9935476310343131.2305092230.7525090.532871342.89654491165.9
Exosome II7684370227785829.6205504626.72228122.903836414.99502287165.4

RNAs were prepared from salivary exosome (exosomes I, II) and WS. Numbers of reads and their percentages of the total number of filtered reads (Total filtered reads) are shown as the reads mapped to the human genome (Unique-Genome), the reads mapped uniquely to a predicted exon–exon bridging sequence (Unique-Bridged), the total number of reads mapped uniquely to the genome and to a predicted exon–exon bridging sequence (Unique-Total), the reads mapped to multiple loci of the human genome (Multiple) and reads unable to be mapped to the human genome (Unmapped).

Of the 10, 8 and 6 million quality-evaluated reads of exosomes I, II and WS, respectively (Table 1; Total filtered reads), a total of 31, 30 and 2.5% of the reads were mapped uniquely to the human genome or to exon–exon junctions (Table 1; Unique-total). We used 4260922 reads for mapping in exosome I, 3572880 reads in exosome II and 285918 reads in WS. These numbers contain the reads of ‘Unique-total’ and the reads of ‘Multiple’ of multiple loci. Figure 2A shows the genomic context of pcRNA and lncRNA (intron, intergenic region, rRNA, other ncRNAs and pseudogene classified according to Ensemble annotation) in the three samples. In addition to pcRNA (14/43/27%, exosomes I/II/WS), lncRNAs (86/57/73%, exosomes I/II/WS) were predominantly detected in all three samples. The main contents of lncRNAs were intergenic regions (54/19/32%, exosomes I/II/WS), introns (27/16/17%, exosomes I/II/WS) and pseudogenes (5.3/20/19%, exosomes I/II/WS). Other ncRNA (0.61/1.9/3.9%, exosomes I/II/WS), such as precursors of sncRNA (e.g., miRNA, snoRNA) and rRNA (0.17/1.2/0.49%, exosomes I/II/WS) were detected at low levels. The major portions of sequencing reads in exosome I and WS were intergenic regions, whereas the major portion of exosome II was pcRNA. Because RNAs categorized as intergenic and introns do not correspond to specific genes, they were eliminated from gene quantification (Fig. 2B, Known RNA). The identified numbers of known RNAs were 837872, 2322933 and 145374 sequence reads from exosomes I, II and WS, respectively.

Fig. 2. (A) Genomic Context of Sequencing Reads in Salivary Exosomes and WS; (B) Sequences That Mapped to Known RNAs

Each pie chart represents the percentage of sequencing reads of exosomes I, II and WS. (A) Total RNA reads of 4260922 reads in exosome I, 3572880 reads in exosome II and 285918 reads in WS were used for mapping. (B) The numbers of known RNAs were 837872, 2322933 and 145374 sequence reads from exosomes I, II and WS, respectively.

The RNAs of pseudogenes in the transcriptome suggest that processed pseudogenes were included in salivary exosomes and WS. Recent studies have shown that processed pseudogenes function in gene regulation.27) Therefore, we focused on the relationship of pcRNAs and pseudogenes, as described later.

Analysis of Common Reads in Exosomes I, II and WS

Figure 3 shows Venn diagrams of all RNAs, pcRNAs and pseudogenes classified by Emsembl annotation. The identified number of quantitated genes (RPKM value of ≥10) were 3035, 4826 and 2489 sequence reads from exosomes I, II and WS, respectively (Supplementary Tables S1–S3). A total of 11% (1189 reads) of all RNAs (10350 reads), 10% (671 reads) of pcRNAs (6454 reads), and 17% (399 reads) of pseudogenes (2281 reads) were commonly detected among exosomes I, II and WS. The transcripts detected in exosome I were highly similar to those of exosome II. In exosome I, 68% (2075 reads) of all RNAs (3035 reads), 84% (1297 reads) of pcRNAs (1541 reads) and 74% (532 reads) of pseudogenes (720 reads) were shared with exosome II. In exosome II, a total of 47% (2257 reads) of all RNAs, 50% (1665 reads) of pcRNAs and 32% (298 reads) of pseudogenes were detected only in exosome II. In contrast, the number of pcRNAs in WS was similar to that in exosome I, while fewer total RNAs and pseudogenes were found in WS compared with exosomes I and II.

Fig. 3. Venn Diagrams of All RNAs, Protein-Coding RNAs and Pseudogenes Expressed in Exosomes I, II and WS

Numbers indicate RNAs that overlap by mapping. Parenthetical numbers indicate numbers of total RNA expressed in exosomes I, II and WS.

RPKM values of reads in each sample showed a high correlation in exosome I compared with exosome II (r2=0.96), exosome I compared with WS (r2=0.77) and exosome II compared with WS (r2=0.79) by scatter plots (Supplementary Fig. S2). This suggests the high-level transcripts in all three samples were coincident.

Characteristics of the Different Classes of RNAs

The 30 most highly expressed RNAs of all transcripts of exosomes I, II and WS are shown in Table 2. Lists of all RNAs of exosomes I, II and WS are shown in Supplementary Tables S1–S3 and contain RPKM rankings of all RNAs, pcRNAs, pseudogenes and pseudogene of highest RPKM with information regarding the nearest pseudogenes (see Materials and Methods). In the RNA population, the lncRNA GenBank accession No. AC091047.1 was the most highly expressed sequence in all three samples. Exosomes I and II shared the same five most highly expressed RNAs. WS also contained the five most highly expressed RNAs of salivary exosomes. The compositions of the 30 most highly expressed RNAs were comparable among the three samples: pseudogenes (50/47/53%, exosomes I/II/WS), other ncRNAs (30/27/27%, exosomes I/II/WS), pcRNAs (17/23/17%, exosomes I/II/WS) and rRNAs (3/3/3%, exosomes I/II/WS). Although pseudogenes and other ncRNAs are not predominantly shown in Fig. 2B, this is the most abundant category in the list of high RPKM reads (Table 2).

Table 2. The 30 Most Highly Expressed Genes in Exosomes I, II and WS
A. Exosome I
RankEnsembl gene IDGene short nameLocusRPKMCategory*
1ENSG00000252197AC091047.1chr8: 70602343–70602417478152Other NC
2ENSG00000240831AC112777.1chr12: 20704357–20704522369166Pseudogene
3ENSG00000252229AC098691.1chr1: 91852861–91852949142297Other NC
4ENSG00000252318AC097532.1chr2: 133038646–133038738132016Other NC
5ENSG00000251948AC092279.1chr19: 24184074–24184165107644Other NC
6ENSG00000251705RN5-8S6chrY: 10037763–1003791546183rRNA
7ENSG00000242257AC044839.1chr11: 45848152–4584829740074Pseudogene
8ENSG00000241482AC064836.1chr2: 203210988–20321109732662Pseudogene
9ENSG00000226958RN28S1chrX: 108297360–10829779225448Pseudogene
10ENSG00000239935AC116340.1chr5: 71146739–7114694223479Pseudogene
11ENSG00000241530AC006368.1chr2: 230045487–23004566618265Pseudogene
12ENSG00000241376AL606830.1chr6: 120583431–12058354717007Pseudogene
13ENSG00000243013AL592307.1chr1: 145277249–14527750116710Pseudogene
14ENSG00000256393AC138123.2chr12: 93477373–9347745116498Pseudogene
15ENSG00000242604AL512503.1chr1: 120543873–12054412516141Pseudogene
16ENSG00000243185AC108078.1chr4: 70296578–7029675315958Pseudogene
17ENSG00000227063RPL41P1chr20: 21735865–2173617112936Pseudogene
18ENSG00000243884AL163011.1chr14: 90341364–9034157711654Pseudogene
19ENSG00000143546S100A8chr1: 153362507–15336366411260ProteinCoding
20ENSG00000210140J01415.10chrMT: 5760–58268232Other NC
21ENSG00000210144J01415.11chrMT: 5825–58917889Other NC
22ENSG00000213741RPS29chr14: 50043389–500654085407ProteinCoding
23ENSG00000243172AP003035.1chr11: 85195011–851953045173Pseudogene
24ENSG00000210174J01415.16chrMT: 10404–104695038Other NC
25ENSG00000252248AC093693.1chr7: 68527370–685274574998Other NC
26ENSG00000201098RNY1chr7: 148684227–1486843404849Other NC
27ENSG00000253945RP11-328L11.1.1chr8: 96416029–964161224658Pseudogene
28ENSG00000171195MUC7chr4: 71296208–713487144465ProteinCoding
29ENSG00000205649HTN3chr4: 70894129–709022554423ProteinCoding
30ENSG00000197756RPL37Achr2: 217362911–2174439033743ProteinCoding
B. Exosome II
RankEnsembl gene IDGene short nameLocusRPKMCategory*
1ENSG00000252197AC091047.1chr8: 70602343–706024171.02E+06Other NC
2ENSG00000240831AC112777.1chr12: 20704357–20704522931345Pseudogene
3ENSG00000252229AC098691.1chr1: 91852861–91852949433010Other NC
4ENSG00000252318AC097532.1chr2: 133038646–133038738410868Other NC
5ENSG00000251948AC092279.1chr19: 24184074–24184165316818Other NC
6ENSG00000256393AC138123.2chr12: 93477373–93477451193124Pseudogene
7ENSG00000227063RPL41P1chr20: 21735865–21736171173497Pseudogene
8ENSG00000242257AC044839.1chr11: 45848152–45848297112919Pseudogene
9ENSG00000241482AC064836.1chr2: 203210988–203211097100453Pseudogene
10ENSG00000143546S100A8chr1: 153362507–15336366496291ProteinCoding
11ENSG00000226958RN28S1chrX: 108297360–10829779274745Pseudogene
12ENSG00000241376AL606830.1chr6: 120583431–12058354765417Pseudogene
13ENSG00000210140J01415.10chrMT: 5760–582663515Other NC
14ENSG00000251705RN5-8S6chrY: 10037763–1003791563089rRNA
15ENSG00000210144J01415.11chrMT: 5825–589158794Other NC
16ENSG00000239935AC116340.1chr5: 71146739–7114694248285Pseudogene
17ENSG00000243013AL592307.1chr1: 145277249–14527750147861Pseudogene
18ENSG00000242604AL512503.1chr1: 120543873–12054412545872Pseudogene
19ENSG00000241530AC006368.1chr2: 230045487–23004566645111Pseudogene
20ENSG00000243185AC108078.1chr4: 70296578–7029675342385Pseudogene
21ENSG00000243884AL163011.1chr14: 90341364–9034157728801Pseudogene
22ENSG00000171195MUC7chr4: 71296208–7134871428477ProteinCoding
23ENSG00000205649HTN3chr4: 70894129–7090225528160ProteinCoding
24ENSG00000210112J01415.6chrMT: 4401–446927693Other NC
25ENSG00000241800AC114498.5chr1: 567995–56806724865Pseudogene
26ENSG00000229117RPL41chr12: 56510369–5651172724393ProteinCoding
27ENSG00000210151J01415.12chrMT: 7445–751423862Other NC
28ENSG00000197756RPL37Achr2: 217362911–21744390321331ProteinCoding
29ENSG00000131469RPL27chr17: 41150445–4115495620401ProteinCoding
30ENSG00000213741RPS29chr14: 50043389–5006540819147ProteinCoding
C. Whole saliva
RankEnsembl gene IDGene short nameLocusRPKMCategory*
1ENSG00000252197AC091047.1chr8: 70602343–706024173.65E+06Other NC
2ENSG00000252229AC098691.1chr1: 91852861–918529493.03E+06Other NC
3ENSG00000251948AC092279.1chr19: 24184074–241841652.97E+06Other NC
4ENSG00000240831AC112777.1chr12: 20704357–207045222.38E+06Pseudogene
5ENSG00000252318AC097532.1chr2: 133038646–133038738805671Other NC
6ENSG00000243185AC108078.1chr4: 70296578–70296753515461Pseudogene
7ENSG00000239935AC116340.1chr5: 71146739–71146942231714Pseudogene
8ENSG00000242257AC044839.1chr11: 45848152–45848297199273Pseudogene
9ENSG00000226958RN28S1chrX: 108297360–108297792183758Pseudogene
10ENSG00000251705RN5-8S6chrY: 10037763–10037915172762rRNA
11ENSG00000241376AL606830.1chr6: 120583431–120583547151241Pseudogene
12ENSG00000241482AC064836.1chr2: 203210988–203211097142535Pseudogene
13ENSG00000241530AC006368.1chr2: 230045487–230045666142027Pseudogene
14ENSG00000256393AC138123.2chr12: 93477373–93477451138602Pseudogene
15ENSG00000243013AL592307.1chr1: 145277249–145277501133734Pseudogene
16ENSG00000242604AL512503.1chr1: 120543873–120544125131823Pseudogene
17ENSG00000227063RPL41P1chr20: 21735865–21736171127740Pseudogene
18ENSG00000243172AP003035.1chr11: 85195011–8519530477836Pseudogene
19ENSG00000243884AL163011.1chr14: 90341364–9034157758579Pseudogene
20ENSG00000201098RNY1chr7: 148684227–14868434056291Other NC
21ENSG00000244469TRNAU2chr22: 44546536–4454662251075Pseudogene
22ENSG00000143546S100A8chr1: 153362507–15336366445291ProteinCoding
23ENSG00000188846RPL14chr3: 40498782–4050654932624ProteinCoding
24ENSG00000258486RN7SL1chr14: 50053296–5005359628711Other NC
25ENSG00000241781AL161626.1chr9: 79186648–7918695027011Pseudogene
26ENSG00000216144AL136373.1chr1: 47006015–4700610625389Other NC
27ENSG00000206696SNORD58Bchr18: 47018033–4701809924835Other NC
28ENSG00000131469RPL27chr17: 41150445–4115495621620ProteinCoding
29ENSG00000229117RPL41chr12: 56510369–5651172720448ProteinCoding
30ENSG00000197756RPL37Achr2: 217362911–21744390318974ProteinCoding

*Categories are provided by ensemble annotations. ‘Other NC’ means lncRNA except pseudogenes and rRNAs.

The 10 most highly expressed pcRNAs of exosomes I, II and WS are shown in Table 3. In the pcRNA population, S100A8 was the most abundant sequence in all three samples. Ribosomal proteins (ribosomal proteins of the large subunit [RPL] and ribosomal proteins of the small subunit [RPS]) were preferentially detected among the three samples (60/57/53%, exosomes I/II/WS) (Table 3 and Supplementary Tables S1–S3). Salivary proteins including MUC7, HTN3 and STATH were also detected among these proteins. The most highly expressed pcRNA, except for ribosomal proteins and salivary proteins, was tumor protein translationally controlled 1 (TPT1) in exosomes I and II (Table 3).

Table 3. The 10 Most Highly Expressed pcRNAs in Exosomes I, II and WS
RankEnsembl gene IDGene short nameRPKMPseudogene ID of ensembleNearest pseudogene ID of pseudogene org.RPKM of related pseudogeneRedundancy*
Exosome I
Exosome II
Whole saliva

* The number of the redundant reads of same parental gene of the pseudogenes. ** Translated proteins were detected in the proteome data.14)

We compared the pcRNA data and proteome data of exosome I (105 proteins) and exosome II (154 proteins).14) Among the top 10 pcRNAs, two RNAs (S100A8, MUC7) were expressed in both types of exosomes. In total, 35 pcRNAs in exosome I (33%) and 66 pcRNAs in exosome II (43%) were expressed. In WS, 35 pcRNAs of exosome I (33%) and 39 pcRNAs of exosome II (25%) were expressed. Therefore, pcRNAs and expressed proteins of the salivary exosome may coexist in secreted exosomes. Moreover, part of the pcRNAs in WS may be derived from salivary exosomes.

Investigating Pseudogenes and Parent Genes

As described above, pseudogenes were detected in all three samples. Thus we next investigated whether the pseudogenes and their parental genes were detectable together in each sample. The total RNA reads that were annotated as pseudogenes by ensemble annotation in exosomes I, II and WS were 720, 945 and 616, respectively (RPKM>10, Supplementary Tables S1–S3). To detect the nearest pseudogene, the start and end positions of the reads of the pseudogenes were searched in the database The numbers of identified nearest pseudogenes were 388, 608 and 377 in exosomes I, II and WS, respectively (Supplementary Tables S1–S3). The annotations of the parental genes of the nearest pseudogenes were searched in the pcRNA datasets. In terms of the total parental genes of the nearest pseudogenes, RNAs of ribosomal proteins were preferentially detected in all three samples (62/65/73%, exosomes I/II/WS). Because a single pcRNA often has more than one pseudogene, Table 3 shows the nearest pseudogene with highest RPKM number. All of the nearest pseudogenes with highest RPKM number are listed in Supplementary Tables S1–S3. After redundancy of the parental gene of the pseudogenes was removed, the numbers of species of the parental gene type were 155, 194, and 128 in exosomes I, II and WS, respectively (Supplementary Tables S1–S3). In the parental genes of the nearest pseudogenes with the highest RPKM number, many ribosomal proteins were detected in all three samples (43/38/52%, exosomes I/II/WS). As for TPT1, the RPKM value of the pcRNA and the pseudogene were high, especially in exosomes I and II. Although the pcRNA of S100A8 showed the highest RPKM value in all three samples, the pseudogene of S100A8 was not detected.

Figure 4 shows Venn diagrams demonstrating the intersections of the pcRNAs and parental RNAs of the pseudogenes. The results showed that 71% (110/155 reads) of pseudogenes of exosome I, 86% (166/194 reads) of those of exosome II and 80% (102/128 reads) of those of WS were common between the pcRNAs and parental RNAs of the pseudogene.

Fig. 4. Venn Diagrams of Protein-Coding RNAs and the Parent Genes of the Pseudogenes Expressed in Exosomes I, II and WS

Numbers of pseudogenes indicate the nearest pseudogene with highest RPKM number.

Gene Ontology Analysis

Because there is little information available on the function of proteins that are translated from the RNAs of salivary exosomes, GO analysis of pcRNAs was performed using CateGOrizer. Similar distributions of biological process (Table 4A) and molecular function categories (Table 4B) were observed among the three samples. The three functional groups most commonly identified in the biological process category were genes associated with cellular processes (29/30/30%, exosomes I/II/WS), metabolism (16/15/14%, exosomes I/II/WS) and macromolecule metabolism (12/12/11%, exosomes I/II/WS). The three functional groups most commonly identified in the molecular function category were genes associated with binding (38/38/39%, exosomes I/II/WS), protein binding (21/21/21%, exosomes I/II/WS) and catalytic activity (11/12/11%, exosomes I/II/WS). Therefore, the RNAs in two types of salivary exosomes and WS may play similar roles in the oral cavity.

Table 4. GO Terms in the Biological Process and the Molecular Function Category
A. Biological process category
GO Class IDDefinitionspcRNA (%)Parent gene of pseudogene (%)
Exosome IExosome IIWSExosome IExosome IIWS
GO:0006139Nucleobase, nucleoside, nucleotide and nucleic acid metabolism5.
GO:0006928Cell motility0.610.540.580.270.250.31
GO:0006944Membrane fusion0.0340.0370.040
GO:0007154Cell communication3.
GO:0008219Cell death1.
GO:0009987Cellular process29.129.629.628.828.729.3
GO:0030154Cell differentiation1.
GO:0043062Extracellular structure organization and biogenesis0.0170.0270.0350.014
GO:0043170Macromolecule metabolism12.211.611.319.318.519.0
GO:0050789Regulation of biological process9.910.410.
GO:0050896Response to stimulus8.
Total counts356446993137624520670795409
B. Molecular function category
GO Class IDDefinitionspcRNA (%)Parent gene of pseudogene (%)
Exosome IExosome IIWSExosome IExosome IIWS
GO:0003676Nucleic acid binding7.46.87.515.113.716.5
GO:0003774Motor activity0.0980.
GO:0003824Catalytic activity10.812.
GO:0004386Helicase activity0.250.370.250.300.23
GO:0004871Signal transducer activity1.
GO:0004872Receptor activity1.
GO:0005198Structural molecule activity2.
GO:0005215Transporter activity2.
GO:0005515Protein binding20.720.520.914.616.816.5
GO:0008565Protein transporter activity0.
GO:0008907Integrase activity0.0280.00640.013
GO:0015075Ion transporter activity1.30.841.10.911.70.33
GO:0015267Channel or pore class transporter activity0.340.220.380.34
GO:0016209Antioxidant activity0.180.210.13
GO:0016301Kinase activity1.
GO:0016491Oxidoreductase activity2.
GO:0016740Transferase activity2.
GO:0016787Hydrolase activity4.
GO:0016829Lyase activity0.
GO:0016853Isomerase activity0.170.310.210.230.33
GO:0016874Ligase activity0.911.10.750.150.230.17
GO:0030234Enzyme regulator activity1.
GO:0045182Translation regulator activity0.130.0520.0910.300.220.50
Total counts7156155337719656874599

Numbers show the percentage in each category. The data are based on the RPKM value of ≥10 differentially expressed pcRNAs and the parent genes of the pseudogenes. The GO analysis was performed using CateGOrizer. Total counts show the sum of the read count of each category. Hyphens (—) indicate that any RNAs were not categorized.

We also performed GO analysis of the parent genes of the pseudogenes (Table 4). The three functional groups of the parental genes most commonly identified in the biological process category were genes associated with cellular process (29/29/29%, exosomes I/II/WS), metabolism (22/21/20%, exosomes I/II/WS) and macromolecule metabolism (19/19/19%, exosomes I/II/WS). These percentages are similar to those of pcRNAs. The three functional groups of the parental genes most commonly identified in the molecular function category were genes associated with binding (38/38/42%, exosomes I/II/WS), protein binding (15/17/17%, exosomes I/II/WS) and nucleic acid binding (15/14/17%, exosomes I/II/WS). However, the percentages of the catalytic activity category in the parental genes were lower than those in pcRNAs in exosomes I, II and WS (8.1/7.1/4.7%, exosomes I/II/WS).


We performed transcriptome analysis of the large RNAs in salivary exosomes and WS using NGS. Palanisamy et al. previously analyzed mRNAs in salivary exosomes by microarray and detected 509 mRNAs.28) To the best of our knowledge, the current study is the first report of an exhaustive analysis of lncRNAs and pcRNAs of salivary exosomes. In addition to pcRNAs, lncRNAs of pseudogenes were also abundant in exosomes. In a previous study, we performed NGS of small non-coding RNAs and demonstrated that miRNAs, piRNAs, snoRNAs, and other small RNAs were found in exosomes I, II and WS.16) A recent study showed that ncRNAs were highly abundant in exosomes secreted by HeLa and MCF-7 cell lines.29) More recently, ncRNAs of human urinary exosomes were analyzed using NGS.20) Thus, these results show that exosomes can entrap many types of RNAs and potentially transfer large amounts of information between cells. Exosomes were shown to have the ability to establish communication between neighboring cells through RNA signal delivery via exosomal RNAs. The mRNAs contained within exosomes can be transcribed into cDNA or translated in the recipient cell.30) However, further study will be needed to elucidate whether the pcRNAs in salivary exosomes are expressed.

The rate of the reads of WS unmapped to human genome was high compared with exosomes I and II (Table 1). In our preliminary analyses, we detected RNAs of exogenous species such as bacteria and fungi in WS (data not shown). Moreover, in the extracellular environment, most miRNAs are associated with Ago2 proteins not encapsulated within exosomes.31) Together these may be reasons why the total RNA yield from WS was larger than the total RNA yield from exosomes. However, the status of these exogenous RNAs is not clear. Further study is needed to identify the exogenous RNAs in WS and exosomes.

The most highly expressed genes of pcRNA were ribosomal RNA proteins (RPL, RPS). A previous study reported that ribosomal RNA proteins were predominantly expressed in urinary exosomes20) and salivary exosomes.28) The pcRNA with the highest RPKM was S100A8 in the two types of salivary exosomes and WS. The translated product of S100A8 is S100 calcium-binding protein A8 (protein S100-A8), which was detected in saliva32) and salivary exosomes.14) Protein S100-A8 often forms a complex with S100 calcium-binding protein A9 (protein S100-A9), also called calprotectin.33) Both the pcRNA and the expressed protein of protein S100-A9 were found in exosomes I and II in this study (Supplementary Tables S1, S2) and our proteome study.14) Calprotectin plays important roles in the regulation of inflammatory processes and immune response. It has functional roles in the activation of leukocytes and promotion of cytokine production via Toll-like receptor 4.33)

Our proteomic study showed that exosomes I and II preferentially contained immune-related proteins, such as IgA and polymeric immunoglobulin receptor.14) However, RNAs of immune-related proteins were rarely detected by NGS. While DPP4 (also known as CD26) is abundantly present in exosome II,14) the RNA of DPP4 was not detected in all three samples. Although it is considered that there is a selective loading of specific mRNA and miRNA molecules into exosomes, its mechanism is not clear and further investigation is necessary.30)

In our study, pcRNAs and the parental genes of the pseudogenes were highly coincident in exosomes I, II and WS. Recent studies showed that transcribed pseudogenes can regulate the translation of homologous protein-coding genes, such as mRNAs of their parental gene, by an small interfering RNA (siRNA)-like function and/or miRNA sponge.27,34) Notably, miRNAs were abundantly present in exosomes I and II.16) In addition, in the GO category of nucleic acid binding, the percentage of parental genes of pseudogenes was higher than that of pcRNA in all three samples (Table 4B). It is possible that the pseudogenes, along with miRNAs, regulate the corresponding mRNAs in salivary exosomes and those in the target cells of the exosomes. The pseudogenes of highly expressed pcRNAs of salivary proteins such as MUC7 were not detected (Supplementary Tables S1–S3). Because salivary proteins are sequentially secreted, the pseudogene of them may not need to express.

The translated product of TPT1 is translationally controlled tumor protein (TCTP), which is a highly conserved protein that is widely expressed in all eukaryotic organisms35) and plays an important role in cell proliferation, cell death and immune responses. Notably, the pseudogene of TPT1 is also highly expressed in exosomes I and II (Supplementary Tables S1, S2). Previous studies showed that TCTP is expressed in salivary glands.36) TCTP was detected in WS but not in exosomes I and II by Western blot analysis (data not shown). The function of TPT1 and the pseudogenes in exosomes in the oral cavity, including in salivary glands, should be examined in future studies.

In conclusion, our study is the first report demonstrating that exosomes contain a large repertoire of lncRNAs, such as processed pseudogenes, in addition to pcRNAs. The mRNA content of exosomes is modulated by the physiological state of the cell and stress conditions and may be useful in investigating the functional state of oral tissue.37,38) Our transcriptional profiles of the salivary exosome can be constructed non-invasively, and can be used for the applications for the discovery of new biomarkers of oral disease such as salivary gland cancer.


We thank Dr. Yoshitaka Taketomi and Dr. Makoto Murakami of Lipid Metabolism Project, and the Tokyo Metropolitan Institute of Medical Science for technical assistance with exosomal RNA detection. We acknowledge Dr. Tsukasa Okada of Hokkaido System Science Co., Ltd. for support with RNA data handling. We are grateful to Dr. Kazuma Aoki of Teikyo Heisei University for helpful discussions. This work was supported by JSPS KAKENHI Grant Numbers 25460172 and 25293083.

Conflict of Interest

The authors declare no conflict of interest.

Supplementary Materials

The online version of this article contains supplementary materials.

Fig. S1. Bioanalyzer profiles of amplified RNA isolated from exosome I, exosome II, and WS.

Fig. S2. Scatter plots of exosome I against exosome II, exosome I against WS and exosome II against WS.

Table S1. List of all RNAs of exosome I. RPKM rankings of all RNAs, pcRNAs, pseudogenes and the pseudogene of highest RPKM with information regarding the nearest pseudogenes (see Materials and Methods).

Table S2. List of all RNAs of exosome II. RPKM rankings of all RNAs, pcRNAs, pseudogenes and the pseudogene of highest RPKM with information regarding the nearest pseudogenes (see Materials and Methods).

Table S3. List of all RNAs of WS. RPKM rankings of all RNAs, pcRNAs, pseudogenes and the pseudogene of highest RPKM with information regarding the nearest pseudogenes (see Materials and Methods).

© 2016 The Pharmaceutical Society of Japan