For Yamagishiella, Y. unicocca strains 2012-1026-YU-F2-6 (NIES-3982, mating-type plus) and 2012-1026-YU-F2-1 (NIES-3983, mating-type minus) were used throughout the study. For the whole-genome sequencing of Eudorina, Eudorina sp. strains 2010-623-F1-E4 (NIES-3984, female) and 2010-623-F1-E2 (NIES-3985, male) were used. In other Eudorina experiments, two other sibling strains of the former two, Eudorina sp. strains 2010-623-F1-E8 (NIES-4018, female) and 2010-623-F1-E3 (NIES-4100, male), were used unless otherwise stated.
De novo whole genome assembly
Genomic DNAs were prepared according to the method of Miller et al.33. Whole-genome sequencing of plus and minus strains of Y. unicocca and male and female strains of Eudorina sp. were performed using PacBio and Illumina technologies as described previously34. Briefly, genomic DNA was sheared using a DNA shearing tube, g-TUBE (Covaris). Several 20 kb libraries for P5-C3 and P6-C4 sequencing were constructed and sequenced on SMRT cells in PacBio RS II (Pacific Biosciences). These reactions generated 2.5 M and 4.1 M sub reads (total bases: 17.2 Gb and 22.8 Gb, respectively) for plus and minus strains of Y. unicocca, respectively, and 2.6 M and 4.4 M sub reads (total bases: 15.6 Gb and 23.0 Gb, respectively) for male and female strains of Eudorina sp., respectively. Sequencing coverage was about 128x, 162x, 93x, and 125x based on the estimated genome size, respectively. In each of the four strains, PacBio reads were assembled de novo with HGAP3 assembler (Pacific Biosciences). Furthermore, genomic DNA was fragmented using a DNA Shearing System, S2 Focused-ultrasonicator (Covaris). Illumina paired-end libraries (insert sizes with 400 bp for plus and minus strains of Y. unicocca, 600 bp for male strain of Eudorina sp., and 400 bp for female strain of Eudorina sp.) were constructed using a TruSeq DNA Sample Prep Kit (Illumina) according to the manufacturer instructions. These libraries were sequenced using Illumina HiSeq 2000 and 2500 sequencers (230.5 M and 197.0 M reads with 150 bp read length for plus and minus strains of Y. unicocca, respectively, 251.2 M reads with 250 bp read length for male strain of Eudorina sp., and 165.3 M reads with 100 bp read length for female strain of Eudorina sp.). Total bases and sequencing coverage were 34.6 Gb (258x), 29.5 Gb (210x), 62.8 Gb (372x), and 16.5 Gb (89x), respectively. The Illumina data were then mapped against the PacBio assembly sequence using BWA-MEM Release 0.7.735 including error correction with the samtools/bcftools/vcfutils.pl program v0.1.19 (https://samtools.sourceforge.net/), ultimately giving a set of nuclear genome sequence. We performed a long-range scaffolding using paired-end Sanger sequences from 38,400 and 3840 fosmid clones of female and male strains of Eudorina sp., respectively (DRA: DRA004920, DRA002727, and DRA004919; whole genome assembly: BDSI01000001-BDSI01003180, BDSJ01000001-BDSJ01002471, BDSK01000001-BDSK01001897, BDSL01000001-BDSL01001461).
Sex-determining region identification
Candidate scaffolds for entire sex-determining regions (Y. unicocca plus: Scaffold0026/0199/0237; minus: Scaffold0005/0230/0253/0437/1431; Eudorina sp. Female: scaffold1024; male: scaffold1040) were screened as major significant matching subjects with more than three non-overlapping protein hits (cutoff maximum E-value: 1e−1036) by TBLASTN (NCBI) on de novo assemblies of Y. unicocca and Eudorina sp. with 80 proteins on V. carteri female MT (Genbank Acc. No. GU784915) as queries and then dotplot-analyzed between haplotypes of same species using YASS (https://bioinfo.lifl.fr/yass/index.php)37 to detect the rearranged genomic regions of MT.
We performed TBLASTN searches against the genome assembly databases of Y. unicocca and Eudorina sp. with the volvocine sex-limited proteins (Gonium pectorale FUS1, BAU6160711; G. pectorale MTD1, BAI4948738) as the queries, retrieved sequences with the highest similarity, and designed gene-specific primers (listed in Supplementary Table 3) based on these sequences. To identify the ORF sequences, polyadenylated mRNAs from each sample were isolated using Dynabeads Oligo (dT)25 (Thermo Fisher Scientific), reverse transcribed with Superscript III reverse transcriptase (Thermo Fisher Scientific), and amplified with KOD FX Neo DNA polymerase (Toyobo) and the gene-specific primers. To obtain the full-length cDNA sequences, 5′RACE and 3′RACE were performed using the GeneRacer kit (Thermo Fisher Scientific). The PCR products were directly sequenced, or first cloned into the pCR4Blunt-TOPO vector (Thermo Fisher Scientific) and then sequenced, using an ABI PRISM 3100 Genetic Analyzer (Thermo Fisher Scientific) with a BigDye Terminator cycle sequencing ready reaction kit, v.3.1 (Thermo Fisher Scientific). Full-length MID genes of Y. unicocca (minus strain NIES-1859) and Eudorina sp. (male strain NIES-2735) were determined using the degenerate PCR method7, 8.
Other gene models on MT scaffolds were predicted by Augustus39 with the C. reinhardtii parameter and then manually curated based on the similarity among C. reinhardtii and V. carteri gene models (JGI).
MT genome sequences harboring rearranged domains with sex-limited genes and gametologs in Y. unicocca and Eudorina sp. and the autosomal gene YuMTD1 are available under accession numbers LC314412–LC314416.
Preparation of asexual and sex-induced samples of Yamagishiella and Eudorina
Asexual samples of Y. unicocca were obtained by culturing the algae in screw-cap tubes (18 × 150 mm) containing ~11 ml AF-6 medium40, 41, on a 14-h light/10-h dark cycle (light intensity: 60–110 μmol m−2 s−1) at 23 °C for 3–5 days. To induce sexual reproduction of Y. unicocca, asexually growing algae (~0.2 ml) were inoculated into 11 ml “VTAC + soil extract medium” (VTAC medium41, 42 supplemented with 3%(v/v) soil extract (~0.5 mg of paddy soil suspended in 20 ml distilled water and autoclaved for 10 min)) in a tube, and cultured for 8 days under the same condition. The algal culture of each sex was then transferred into Petri dishes (60 mm × 15 mm), incubated for further 4 days, and used as a sex-induced Y. unicocca sample for the following analysis. “Mixed” sample of Y. unicocca were obtained by mixing the sex-induced samples (1 ml each) of both mating types in Petri dishes (30 mm × 10 mm), which were subsequently incubated for 3 h under the same condition, and used for semiquantitative RT-PCR.
Asexual samples of Eudorina sp. were obtained by culturing the algae in the screw-cap tubes containing ~10 ml SVM medium43, on a 14-h light/10-h dark cycle (light intensity: 180–320 μmol m−2 s−1) at 25 °C for 3 days. To induce sexual reproduction of Eudorina sp., ~0.4 mL of asexually grown algae in SVM medium were inoculated into Petri dishes (60 mm × 15 mm) containing ~11 ml “VTAC + soil extract medium” and cultured for 3 days under the same condition. The 11 ml culture of each sex was then transferred into Petri dishes (90 mm × 20 mm), diluted with twice volume of mating medium42, and cultured for further 8 h under the same condition to form a sex-induced male or female culture of Eudorina sp., in which formation of sperm packets was observed in the male strain (Fig. 3b). “Mixed” samples of Eudorina sp. were prepared by mixing the sex-induced female and male samples (5 ml each) in Petri dishes (60 mm × 15 mm), subsequently incubated for 16 h under the same condition, and used for the following analysis.
All microscopic images were acquired using a BX53 microscope (Olympus) equipped with differential interference contrast optics. The digital images were captured using a DP71 camera (Olympus) with DP controller software (Olympus), and their levels were adjusted with Adobe Photoshop CS6 (Adobe Systems Inc.).
Semiquantitative RT-PCR analysis
From Y. unicocca samples, polyadenylated mRNAs were isolated using Dynabeads Oligo (dT)25 and reverse transcribed as described above. From Eudorina samples, total RNAs were extracted with TRI reagent (Molecular Research Center), treated with DNase I (amplification grade; Thermo Fisher Scientific), and reverse transcribed with Superscript III reverse transcriptase and Oligo (dT)20 primer (Thermo Fisher Scientific). PCR reactions were performed with KOD FX Neo DNA polymerase (Toyobo). Primer sequences and PCR condition are listed in Supplementary Table 3 online. Under the conditions, all primer sets produced amplicons of the expected size and sequence (confirmed by direct sequencing). The PCR products were electrophoresed on 2% (w/v) agarose gels and stained with ethidium bromide44. The gel images were captured using a ChemiDoc XRS system (Bio-Rad) with Quantity One software (Bio-Rad) and their levels were adjusted as described above.
Molecular evolutionary analysis
Divergence scores of synonymous and non-synonymous substitutions between gametologs were computed using yn00 of the PAML4 package45; nonsynonymous and synonymous site divergence of aligned coding sequences of gametologs was calculated based on Yang and Nielsen46 with equal weighting between pathways, and the same codon frequency for all pairs11.
Raw reads, genome assemblies, and annotations were deposited at DDBJ/EMBL/GenBank under the accessions as follows; DRA: DRA004920, DRA002727, and DRA004919; whole genome assembly: BDSI01000001-BDSI01003180, BDSJ01000001-BDSJ01002471, BDSK01000001-BDSK01001897, BDSL01000001-BDSL01001461; annotations: LC314412- LC314416. All the other data generated or analyzed during this study are included in this published article and its Supplementary information.