CutProtFam-Pred: DETECTION AND CLASSIFICATION OF PUTATIVE STRUCTURAL CUTICULAR PROTEINS FROM SEQUENCE ALONE, BASED ON PROFILE HIDDEN MARKOV MODELS
Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Panepistimiopolis, Athens 157 01
The arthropod cuticle is a composite, bipartite system, made of chitin filaments embedded in a proteinaceous matrix. The physical properties of cuticle are determined by the structure and the interactions of its two major components, cuticular proteins (CPs) and chitin. The proteinaceous matrix consists mainly of structural cuticular proteins. The majority of the structural proteins that have been discovered to date belong to the CPR family, and they are identified by the conserved R&R region (Rebers and Riddiford Consensus). Three sub-families of the CPR family RR-1, RR-2 and RR-3, have also been identified based on conservation at sequence level and some correlation with the cuticle type. Recently, several novel families, also containing characteristic conserved regions, have been described: CPF (a conserved region with 44 amino acids); CPFL (a conserved C-terminal region similar to CPF); five low complexity families: Tweedle, CPLCA, CPLCG, CPLCP, CPLCW; CPCFC (2 or 3 C-x(5)-C repeats); CPG (rich in glycines); CPAP3 and CPAP1 (analogous to peritrophins, with 3 and 1 chitin-binding domains respectively). The package HMMER v3.0 (http://hmmer.janelia.org/) was used to build characteristic profile Hidden Markov Models based on their characteristic regions, for the families where this was possible (CPF, CPCFC, CPLCA, CPLCG, CPLCW, Tweedle, CPAP3, CPAP1). Using these models, as well as the models previously created for CPR-RR1 and CPR-RR2 (Karouzou et al., 2007), we developed CutProtFam-Pred (Ioannidou et al., 2014), an on-line tool (http://bioinformatics.biol.uoa.gr/CutProtFam-Pred/) that allows the accurate detection and classification of putative structural cuticular proteins, from sequence alone, in proteomes.
****NEW (Added on May 2023): When proteins have been assigned to either the CPAP1 or CPAP3 families by our tool, it is essential that the sequence be checked to be sure they have no more than 1 or 3 CBM_14 domains, respectively. This can be done quickly by using NCBI Conserved Domain Search [ https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi].
****NEW (Added on May 2023): When proteins have been assigned to either the CPAP1 or CPAP3 families by our tool, it is essential that the sequence be checked to be sure they have no more than 1 or 3 CBM_14 domains, respectively. This can be done quickly by using NCBI Conserved Domain Search [ https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi].
Karouzou M.V., Spyropoulos Y., Iconomidou V.A., Cornman R.S., Hamodrakas S.J. and Willis J.H. (2007). "Drosophila cuticular proteins with the R&R Consensus: annotation and classification with a new tool for discriminating RR-1 and RR-2 sequences." Insect Biochem Mol Biol 37(8): 754-760.
Ioannidou Z.S., Theodoropoulou M.C., Papandreou N.C., Willis J.H. and Hamodrakas S.J. (2014). "CutProtFam-Pred: detection and classification of putative structural cuticular proteins from sequence alone, based on profile hidden Markov models." Insect Biochem Mol Biol 52: 51-59.
Ioannidou Z.S., Theodoropoulou M.C., Papandreou N.C., Willis J.H. and Hamodrakas S.J. (2014). "CutProtFam-Pred: detection and classification of putative structural cuticular proteins from sequence alone, based on profile hidden Markov models." Insect Biochem Mol Biol 52: 51-59.