The First Identification of a Narnavirus in Bigyra, a Marine Protist

Current information on the diversity and evolution of eukaryotic RNA viruses is biased towards host lineages, such as animals, plants, and fungi. Although protists represent the majority of eukaryotic diversity, our understanding of the protist RNA virosphere is still limited. To reveal untapped RNA viral diversity, we screened RNA viruses from 30 marine protist isolates and identified a novel RNA virus named Haloplacidia narnavirus 1 (HpNV1). A phylogenetic analysis revealed that HpNV1 is a new member of the family Narnaviridae. The present study filled a gap in the distribution of narnaviruses and implies their wide distribution in Stramenopiles.


Data processing
To obtain cleaned reads, we removed low-quality, adapter, low-complexity, and rRNA sequences from the raw sequence reads as described previously (Hirai et al., 2021) with a custom Perl script (https://github. com/takakiy/FLDS). Following previous reports (Urayama et al., 2016(Urayama et al., , 2018, each cleaned read was assembled de novo using CLC GENOMICS WORKBENCH version 11.0 (CLC Bio, Aarhus, Denmark). To obtain full-length sequences, assembled contigs were manually extended. To extend the contig sequence, re-mapping the contigs were conducted using CLC Genomics Workbench version 11.0, and the results were visualized using a tablet viewer (Milne et al. 2010). Then, partially aligned reads at the terminus of contigs were manually collected by viewing the alignment and re-assembled. These operations were repeated until the extension of the contig was finished. If 10 or more reads stopped in the same position around the end of the contig, we recognized the position as the terminal end. When both ends of the contig were defined as termini, the contig was defined as a full-length sequence. The assembled contigs were annotated by BLASTX analysis against the NCBI non-redundant protein database and RNA viral protein sequences detected in recent RNA virome studies (Chen et al., 2022;Neri et al., 2022;Zayed et al., 2022). To identify more distantly related RNA viruses, we performed RNA virus detection based on hidden Markov model (HMM) profiles. Contigs without any BLASTX hits at the 1×10 -5 e-value cutoff were submitted against RVDB-prot (Bigot et al., 2019) and NeoRdRp (Sakaguchi et al., 2022) using the HMMer3 program with the default parameters and e-value 1×10 -5 threshold (Eddy, 2011).

RT-PCR analysis
To identify the host organism of the RNA virus detected in pooled sequence data, we conducted RT-PCR analyses targeting the virus sequence. RT-PCR analysis was performed on each isolate contained in poo1-2. Total nucleic acids were individually extracted from cells of each isolate with SDS-phenol and were used as the template. In this RT-PCR analysis, we used two specific primer pairs: Narna-P1F (5'-GGT ACG AAA AGG CCC GAT CA-3') and Narna-P1R (5'-ACA AGG CTC ATC TCC GCA AA -3'); and Narna-P2F (5'-TCG TCT TGG TCT TGA GCG TC-3') and Narna-P2R (5'-ATA CGC CCT CTT TGG AAC GG -3'). RT-PCR was performed using the SuperScript III One-Step RT-PCR System with Platinum Taq (Invitrogen) according to the manufacturer's protocol. PCR products were confirmed on a 1% agarose gel. Nucleic acids were stained with Gel Red (Biotium, CA, USA). The amplified fragments were excised and purified using a FastGene Gel/PCR Extraction Kit (Nippon Genetics, Tokyo, Japan). The products then were applied for direct Sanger sequencing.

DsRNA-sequencing from YPF1522
To obtain the complete genome sequence of the detected virus, we constructed a dsRNA sequencing library from cells of YPF1522. The cells were collected from cultures by centrifugation at 2,400 × g for 4 min, and dsRNA was purified from 0.16 g of the collected cells as described above.
The purified dsRNA was converted into dscDNA by FLDS ver. 3 (Hirai et al., 2021). The resultant dscDNA was used for the construction of the Illumina sequencing library as mentioned above, and the library was sequenced using the Illumina MiSeq platform with 300 bp paired-end sequences (Illumina). More than 400,000 reads were obtained for each library. The raw sequence reads were processed as mentioned above (see Data processing).

Phylogenetic analysis
In the phylogenetic analysis, the deduced amino acid sequences of the putative RdRp domains and related sequences were aligned using MUSCLE (Edgar, 2004) in MEGA6 (Tamura et al., 2013).