Deep genomic analysis of Coelastrella saipanensis (Scenedesmaceae, Chlorophyta): comparative chloroplast genomics of Scenedesmaceae
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
Many species belonging to the coccoid green algae genus Coelastrella are considered potential candidates for the large-scale production of natural pigments and biofuels. However, little is known about the structural, functional and molecular aspects of the chloroplast genomes (cpDNAs) of this genus. In the present study, the complete sequence of the cpDNA of strain FACHB-2138, which was further identified as Coelastrella saipanensis Hanagata based on morphological and molecular analyses, was elucidated. The 196 140 bp cpDNA sequence that was assembled as a circular map was found to possess the typical quadripartite structure. The two identical copies of 11 897 bp inverted repeat (IR) sequences were separated from one another by single copy regions. The large single copy region (LSC) was 104 949 bp, whereas the small single copy region (SSC) was 67 397 bp. The cpDNA encoded a total of 96 unique genes, which included 67 protein-coding genes, three rRNA genes and 26 tRNA genes. A total of 19 group I introns were annotated in this genome. Comparative analyses with three species from the family Scenedesmaceae showed C. saipanensis had a slightly expanded genome, higher GC content and less skewed distribution of its genes between the two DNA strands than that of the other three species. The cpDNA data deduced from the present study helps to expand our present understanding of plant systematics and phylogenetic reconstruction, and identify the possible biotechnological applications of the species belonging to the studied taxa.