(PDF) De novo assembly of the kidney and spleen transcriptomes of

April 4, 2017 | Author: Anonymous | Category: Assembly
Share Embed

Short Description

Virginia Institute of Marine Science, College of William and Mary, Route 1208, Great Road, 23062 ..... Operacional Facto...


Marine Genomics xxx (xxxx) xxx–xxx

Contents lists available at ScienceDirect

Marine Genomics journal homepage: www.elsevier.com/locate/margen

De novo assembly of the kidney and spleen transcriptomes of the cosmopolitan blue shark, Prionace glauca André M. Machadoa,1, Tereza Almeidab,c,1, Gonzalo Mucientesb,d, Pedro J. Estevesb,c, ⁎ ⁎⁎ Ana Verissimob,e, , L. Filipe C. Castroa,c, CIIMAR – Interdisciplinary Centre of Marine and Environmental Research, U. Porto – University of Porto, 4450-208 Matosinhos, Porto, Portugal CIBIO-InBIO, Centro de Investigacão em Biodiversidade e Recursos Genéticos, Campus Agrário de Vairão, Universidade do Porto, 4485-661 Vairão, Portugal c Department of Biology, Faculty of Sciences, U. Porto - University of Porto, Portugal d Instituto de Investigaciones Marinas, IIM-CSIC, Eduardo Cabello 6, 36208 Vigo, Spain e Virginia Institute of Marine Science, College of William and Mary, Route 1208, Great Road, 23062 Gloucester Point, VA, USA a




Keywords: RNA-Seq Carcharhinids Oceanic pelagic sharks Highly migratory Chondrichthyes Spleen and kidney

Cartilaginous fishes (sharks, rays and chimaeras) comprise a highly diversified group of basal vertebrates occupying a plethora of ecological aquatic niches. They represent critical components of marine ecosystems and food webs, although numerous species are threatened and almost half are poorly known. Genomic resources emerging from this basal jawed vertebrate group have offered valuable insights into the evolution of vertebratespecific traits. Yet, the taxon remains largely understudied. Here, we generated the first high-quality de novo assembly of kidney and spleen transcriptomes of the blue shark (Prionace glauca). A total of 32,917,412 and 52,666,542 reads were obtained for spleen and kidney, respectively, using RNA-Seq Illumina technology. De novo multi-tissue assembly resulted in 97,317 unigenes with an N50 of 1975 bp, in which 87,571 were assigned to a particular tissue or combination of tissues based on the sequencing read mapping. Functional annotation generated 28,564 and 19,854 open reading frames in spleen and kidney, respectively. This dataset provides a significant resource for physiological and evolutionary studies, namely into the unique osmoregulatory system of Chondrichthyes and the evolution of the immune system in vertebrates.

1. Introduction Chondrichthyans, known as cartilaginous fishes, are a basal jawed vertebrate class comprising ~1200 species (Weigmann, 2016) divided into two major subclasses: Holocephali (chimaeras) and Elasmobranchii (sharks and rays). Chondrichthyans thus offer a unique phylogenetic standpoint to address the acquisition of numerous anatomical and physiological innovations in the gnathostome lineage (e.g Castro et al., 2013; Flajnik and Kasahara, 2010; Freitas et al., 2006). Yet, compared to other vertebrate classes, the amount of genomic resources available is rather limited. Genome sequences are currently available from a holocephalan, the elephant shark (C. milii) (Venkatesh et al., 2014), and from an elasmobranch, the little skate (Leucoraja erinacea) (Wang et al., 2012). In recent years, Next Generation Sequencing (NGS) technologies have facilitated the acquisition of additional datasets from

chondrichthyan species, and particularly from RNA-Seq studies (see Supplementary Table S1 for a summary of RNA-Seq published studies). These have been particularly useful at unveiling molecular and evolutionary patterns of physiological processes such as osmoregulation (Chana-Munoz et al., 2017), adaptive immunity (Marra et al., 2017) and endocrine regulation (Mulley et al., 2014). In the context of a larger project aiming at providing the genomic resources for vertebrate comparative genomics, we have used NGS to obtain high-coverage transcriptomes of two key tissues in osmoregulation- and immunity-related physiology, the spleen and kidney, of the blue shark Prionace glauca (Order Carcharhiniformes, family Carcharhinidae). The blue shark is a worldwide distributed and cosmopolitan species inhabiting both open ocean and coastal waters of temperate and tropical seas (Nakano and Stevens, 2008). It is also a highly migratory species capable of transoceanic migrations associated

⁎ Correspondence to: Ana Verissimo, CIBIO-InBIO, Centro de Investigacão em Biodiversidade e Recursos Genéticos, Campus Agrário de Vairão, Universidade do Porto, 4485-661 Vairão, Portugal. ⁎⁎ Correspondence to: L. Filipe C. Castro, CIIMAR – Interdisciplinary Centre of Marine and Environmental Research, U. Porto – University of Porto, 4450-208 Matosinhos, Porto, Portugal. E-mail addresses: [email protected] (A. Verissimo), fi[email protected] (L.F.C. Castro). 1 These authors contributed equally to this work.

https://doi.org/10.1016/j.margen.2017.11.009 Received 31 October 2017; Received in revised form 23 November 2017; Accepted 23 November 2017 1874-7787/ © 2017 Elsevier B.V. All rights reserved.

Please cite this article as: Machado, A.M., Marine Genomics (2017), https://doi.org/10.1016/j.margen.2017.11.009

Marine Genomics xxx (xxxx) xxx–xxx

A.M. Machado et al.

extraction. Total RNA (RNAt) of each tissue was extracted using the RNeasy Mini Kit (Qiagen) with a pre-treatment with DNaseI and subsequent elution of the extracted RNAt in nuclease free water, according to the manufacturer's protocol. High-quality kidney and spleen RNAt samples were sequenced using 100 bp paired-end reads on the Illumina HiSeq 2500 platform by STABVIDA, Lda (Caparica, Portugal).

Table 1 MixS descriptors. Item


Investigation_type Project_name Lat_lon Geo_loc_name Collection_date Biome Feature Material Env_package Seq_meth Assembly method Collector Sex Maturity Size Assembly SRA Accession number TSA Accession number

Eukaryote Blue shark transcriptome of kidney and spleen 41.58N 44.07W Atlantic Ocean 2014-06-14 Sea water (ENVO:00002149) Ocean (ENVO:00000015) Sea water (ENVO:00002149) Water Illumina HiSeq Trinity Gonzalo Mucientes Male Immature 152 cm fork length Trinity v2.4.0 Kidney (SRR6188468) & Spleen (SRR6188469) GFYY00000000.1



4340 (13.19%)

13050 (39.66%)

2.2. Transcriptome de novo assembly and annotation Sequencing produced 32,917,412 million (M) reads for spleen and 52,666,542 M for kidney. After quality filtering, the high-quality reads of both tissue samples were combined and assembled de novo using Trinity v2.4.0 software (Grabherr et al., 2011). Results showed a total of 97,317 unigenes, 139,202 transcripts with an average length of 901 bp, and an N50 length of 1975 bp (Supplementary Table S2). Importantly, the “unigenes” here mentioned result from the Trinity output and are considered as groups of transcripts clustered and based on shared sequence content. Before submitting the transcriptome to the transcriptome shotgun assembly (TSA) database of NCBI and to guarantee the full removal of contamination sources such vectors, adapters, linkers, and PCR primers not identified in the initially quality-filtered reads step, one additional quality control step against the UniVec database was performed. Any assembled contigs with strong matches with the UniVec database were considered as exogenous to P. glauca and therefore removed from the dataset (See Supplementary file S1 for detailed methods). To address tissue-specific expression patterns, raw reads of each tissue were mapped and quantified against the global transcriptome with Bowtie2 v2.3.0 (Langmead and Salzberg, 2012) and RSEM v.1.2.31 (Li and Dewey, 2011). In this analysis, contigs with abundance values of Transcripts Per Kilobase Million (TPM) ≥ 1 were considered as having expression. Therefore, only 113,595 transcripts in 87,571 unigenes, present in one, two or both tissues were kept in the filtered assembly for further analyses (Supplementary Table S2; See Supplementary file S1 for detailed methods). To assess the relative quality and completeness of the filtered blue shark assembly, the obtained transcriptome was further compared with a reference vertebrate gene set and five other chondrichthyan transcriptomes using BUSCO (Simão et al., 2015) (Supplementary file S1 for detailed methods). Only available chondrichthyan transcriptomes assembled with Trinity and with a similar methodology were used to avoid imprecise comparisons. The BUSCO analysis revealed comparable completeness among carcharhiniform transcriptomes, namely P. glauca (81.7% complete; our data) and Scyliorhinus canicula (83.6% complete; Mulley et al., 2014), and higher completeness against several individual tissues of the squaliform shark Squalus acanthias (Chana-Munoz et al., 2017) liver - 65.8% complete; ovary - 69.9% complete; brain - 79.2% complete; kidney - 78.7% complete (see Supplementary Table S3 for more details). To perform the functional annotation the filtered assembly was submitted to the Trinotate v3.0.1 (http://trinotate.github.io). The Trinotate pipeline includes several annotation categories such as Kegg, protein domains, assignments to orthologous group of genes (eggNOG) and Gene Ontology (GO). We predicted open reading frames (ORFs) with Transdecoder (http://transdecoder.github.io). This software determined a total of 32,904 contigs in 21,492 unigenes with ORF, encoding for ≥100 amino acids. The tissue distribution of transcripts with ORFs showed a total of 28,564 and 19,854 transcripts in the kidney and spleen, respectively (Fig. 1). Although obtained from different tissues, these values are in the range of number of transcripts with ORF mapped per tissue in Mulley et al. (2014) to S. canicula (brain33,896, liver- 13,115, pancreas- 17,648). The number of transcripts with significant (E-value cutoff of 1e-5) blastx and blastp hits against UniProtKB/Swiss-Prot database, 26,798 and 26,757 respectively, are summarized in the Supplementary Table S4. To complement the functional annotation, the contigs of filtered assembly were queried against

15514 (47.15%)

Fig. 1. Tissue distribution of transcripts with ORF, as determined by mapping the sequencing reads derived from each tissue to a combined, global tissue assembly. Transcripts values of TPM ≥ 1 were taken as evidence of expression.

with ontogeny and reproduction, as well as seasonal latitudinal movements associated with feeding (reviewed in Nakano and Stevens, 2008; Queiroz et al., 2012). Its life history strategy is highly derived within carcharhinid sharks, which are mostly coastal in habit (Compagno et al., 2005), although numerous neonates in shallow inshore waters have been reported for this species recently (Bañón et al., 2016). Blue sharks are thus exposed to distinct environmental conditions throughout an individual's lifespan, with likely consequences on physiological pathways to cope with such a life strategy. Our transcriptome datasets were assembled and annotated, and provide relevant information for comparative analysis across the vertebrate tree of life, and to investigate the biology, physiology and ecology of this remarkable species. 2. Data description 2.1. Sampling, RNA extraction and Illumina sequencing One specimen of P. glauca was obtained from North Atlantic waters during commercial longline fishing operations (Table 1). The kidney and spleen tissues were sampled immediately upon hauling of the specimen on board, stored in RNAlater and kept at − 20 °C until RNA 2

Marine Genomics xxx (xxxx) xxx–xxx

A.M. Machado et al.

Transcription factor activity, sequence-specific DNA binding


Extracellular region


Endoplasmic reticulum Positive regulation of transcription from RNA polymerase II promoter Endoplasmic reticulum membrane

Integral component of plasma membrane Golgi apparatus Extracellular space Zinc ion binding

Nucleolus Mitochondrion Regulation of transcription, DNA-templated DNA binding RNA binding

Membrane ATP binding Extracellular exosome Transcription, DNA-templated Nucleoplasm

Metal ion binding Plasma membrane Integral component of membrane Cytosol Cytoplasm

Nucleus 0






Fig. 2. Transcripts mapping the top 25 GO terms as a percentage of all transcripts per tissue generated by the assembly.

deposited at DDBJ/ENA/GenBank under the accession GFYY00000000 (https://www.ncbi.nlm.nih.gov/nuccore/GFYY00000000.1). The version described in this paper is the first version, GFYY01000000. Filtered assembly and pools of tissue-specific mapped contigs have been deposited in the figshare digital repository (https://figshare.com/s/ 8476db7b0748a8fdbb2d). Supplementary data to this article can be found online at https:// doi.org/10.1016/j.margen.2017.11.009.

the Non-Redundant database of NCBI through blastx tool of DIAMOND v0.8.36 software (Buchfink et al., 2014). The results showed 41,641 transcripts with blastx hits, with > 70% of transcript hits corresponding to two species of chondrichthyans, Rhincodon typus (56.91%) and C. milii (14.89%). The Supplementary Fig. 1 and Supplementary Table S5 displays the top 10 species for which there was a top blast hit. We could assign GO terms to a total 26,340 transcripts. To provide a broader overview of the assigned gene ontology (GO) terms the top 25 GO terms and their transcripts percentage per tissue are presented in Supplementary Tables S6 and S7 (See Supplementary file S1 for detailed methods). Regarding the three existing structured ontologies – molecular function, cellular component and biological processes – sixteen out of twenty-five representative GO terms belonged to the cellular component, while six and three belonged to the molecular function and biological process, respectively (Fig. 2; Supplementary Table S7).

Acknowledgments We acknowledge Fundação para a Ciência e a Tecnologia for the support to T.A. (SFRH/BD/108253/2015), P.J.E. (IF/00376/2015) and A.V. (SFRH/BPD/77487/2011). L.F.C.C. and A.M.M. research is supported by the MarInfo – Integrated Platform for Marine Data Acquisition and Analysis (reference NORTE-01-0145-FEDER-000031), a project supported by the North Portugal Regional Operational Program (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF). G.M. was supported by a fellowship from the project PELAGICS (PTDC/ MAR-BIO/4458/2012) funded by Portuguese national funds through FCT/MCTES (PIDDAC) and co-funded by the European Regional Development Fund (FEDER) through COMPETE—Programa Operacional Factores de Competitividade (POFC). The biological material used here was also collected under the project PELAGICS.

3. Conclusions The results presented here comprise an important resource available for future investigations into the chondrichthyan osmoregulatory system and the evolution of the immune system in vertebrates, and pave the way for a full genome project. 3.1. Data deposition The obtained raw RNA-Seq data were deposited in the NCBI Sequence Read Archive (SRA) under project number PRJNA414891 (accession numbers SRR6188468 for kidney and SRR6188469 for spleen). The Transcriptome Shotgun Assembly project has been

References Bañón, R., Maño, T., Mucientes, G., 2016. Observations of newborn blue sharks Prionace


Marine Genomics xxx (xxxx) xxx–xxx

A.M. Machado et al.

M.J., Shivji, M.S., 2017. Comparative transcriptomics of elasmobranchs and teleosts highlight important processes in adaptive immunity and regional endothermy. BMC Genomics 18, 87. http://dx.doi.org/10.1186/s12864-016-3411-x. Mulley, J.F., Hargreaves, A.D., Hegarty, M.J., Heller, R., Swain, M.T., 2014. Transcriptomic analysis of the lesser spotted catshark (Scyliorhinus canicula) pancreas, liver and brain reveals molecular level conservation of vertebrate pancreas function. BMC Genomics 15, 1074. http://dx.doi.org/10.1186/1471-2164-15-1074. Nakano, H., Stevens, J.D., 2008. The biology and ecology of the blue shark, Prionace Glauca. In: Sharks of the Open Ocean. Blackwell Publishing Ltd., Oxford, UK, pp. 140–151. http://dx.doi.org/10.1002/9781444302516.ch12. Queiroz, N., Humphries, N.E., Noble, L.R., Santos, A.M., Sims, D.W., 2012. Spatial dynamics and expanded vertical niche of blue sharks in oceanographic fronts reveal habitat targets for conservation. PLoS One 7, e32374. http://dx.doi.org/10.1371/ journal.pone.0032374. Simão, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V., Zdobnov, E.M., 2015. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212. http://dx.doi.org/10.1093/ bioinformatics/btv351. Venkatesh, B., Lee, A.P., Ravi, V., Maurya, A.K., Lian, M.M., Swann, J.B., Ohta, Y., Flajnik, M.F., Sutoh, Y., Kasahara, M., Hoon, S., Gangu, V., Roy, S.W., Irimia, M., Korzh, V., Kondrychyn, I., Lim, Z.W., Tay, B.-H., Tohari, S., Kong, K.W., Ho, S., Lorente-Galdos, B., Quilez, J., Marques-Bonet, T., Raney, B.J., Ingham, P.W., Tay, A., Hillier, L.W., Minx, P., Boehm, T., Wilson, R.K., Brenner, S., Warren, W.C., 2014. Elephant shark genome provides unique insights into gnathostome evolution. Nature 505, 174–179. http://dx.doi.org/10.1038/nature12826. Wang, Q., Arighi, C.N., King, B.L., Polson, S.W., Vincent, J., Chen, C., Huang, H., Kingham, B.F., Page, S.T., Rendino, M.F., Thomas, W.K., Udwary, D.W., Wu, C.H., Team, the N.E.B.C.C, 2012. Community annotation and bioinformatics workforce development in concert—little skate genome annotation workshops and jamborees. Database J. Biol. Databases Curation 2012. http://dx.doi.org/10.1093/DATABASE/ BAR064. Weigmann, S., 2016. Annotated checklist of the living sharks, batoids and chimaeras (Chondrichthyes) of the world, with a focus on biogeographical diversity. J. Fish Biol. 88, 837–1037. http://dx.doi.org/10.1111/jfb.12874.

glauca in shallow inshore waters of the north-east Atlantic Ocean. J. Fish Biol. 89, 2167–2177. http://dx.doi.org/10.1111/jfb.13082. Buchfink, B., Xie, C., Huson, D.H., 2014. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60. http://dx.doi.org/10.1038/nmeth.3176. Castro, L.F.C., Goncalves, O., Mazan, S., Tay, B.-H., Venkatesh, B., Wilson, J.M., 2013. Recurrent gene loss correlates with the evolution of stomach phenotypes in gnathostome history. Proc. R. Soc. B Biol. Sci. 281, 20132669. http://dx.doi.org/10. 1098/rspb.2013.2669. Chana-Munoz, A., Jendroszek, A., Sønnichsen, M., Kristiansen, R., Jensen, J.K., Andreasen, P.A., Bendixen, C., Panitz, F., 2017. Multi-tissue RNA-seq and transcriptome characterisation of the spiny dogfish shark (Squalus acanthias) provides a molecular tool for biological research and reveals new genes involved in osmoregulation. PLoS One 12, e0182756. http://dx.doi.org/10.1371/journal.pone. 0182756. Compagno, L.J.V., Dando, M., Fowler, S.L., 2005. Sharks of the World. Princeton University Press. Flajnik, M.F., Kasahara, M., 2010. Origin and evolution of the adaptive immune system: genetic events and selective pressures. Nat. Rev. Genet. 11, 47–59. http://dx.doi.org/ 10.1038/nrg2703. Freitas, R., Zhang, G., Cohn, M.J., 2006. Evidence that mechanisms of fin development evolved in the midline of early vertebrates. Nature 442, 1033–1037. http://dx.doi. org/10.1038/nature04984. Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q., Chen, Z., Mauceli, E., Hacohen, N., Gnirke, A., Rhind, N., di Palma, F., Birren, B.W., Nusbaum, C., Lindblad-Toh, K., Friedman, N., Regev, A., 2011. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652. http://dx.doi.org/10.1038/nbt. 1883. Langmead, B., Salzberg, S.L., 2012. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359. http://dx.doi.org/10.1038/nmeth.1923. Li, B., Dewey, C.N., 2011. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinforma. 12, 323. http://dx.doi.org/10. 1186/1471-2105-12-323. Marra, N.J., Richards, V.P., Early, A., Bogdanowicz, S.M., Pavinski Bitar, P.D., Stanhope,


View more...


Copyright © 2017 DATENPDF Inc.