GPCRpipe

About GPCRpipe

Basic Theory

G-protein coupled receptors represent the largest and most diverse superfamily of transmembrane receptors in eykaryotic cells, with nearly 800 genes encoding such receptors in the human genome (Bjarnadόttir, et al., 2006). At the interface between the extracellular and intracellular milieu, they are involved in the regulation of nearly every physiological process by converting extracellular stimuli into intracellular responses. Most GPCRs functions are performed by a special group of proteins called G-proteins. G-proteins act as switches for transducting messages from the extracellular space inside the cell, through their interaction with GPCRs (Kristiansen, 2004). G-proteins interact with various effector molecules to immediately change the concentrations of cellular molecules leading eventually to a wide range of cellular and physiological responses (Oldham and Hamm, 2008).

All known members of the GPCR superfamily share common topology: 7 transmembrane α helices, 3 intra- and 3 extra-cellular loops, extracellular N-terminal and intracellular C-terminal. Despite their common architecture and function, GPCRs show important diversity at sequence level. This lack of sequence similarity makes difficult the detection of GPCRs in proteomes, especially, the finding of novel members of the GPCR superfamily.

Here we present GPCRpipe, a pipeline for the accurate detection of GPCRs in proteomes, based on previously published tools and a HMM model especially designed to detect GPCRs.

How to run GPCRpipe?

It is very easy to use GPCRpipe. The user should go to the Run page of the tool and insert in the text box one or more FASTA sequences (up to 100 per run) or upload a text file containing the query sequences, then choose one of the two available methods (AND/OR) using the respective radio button, and, finaly, press the "Submit" button (Figure 1).

Figure 1. The Run page of GPCRpipe

In a couple of minutes the Result page is return to the user (Figure 2). Here the user may be informed if the submitted sequences are GPCRs or not. For each predicted GPCR, further information is provided. Specifically, the HMM Reliability Score, the GPCR specific Pfam profile accompanied by its limits, the predicted signal peptide via SignalP 4.0 (Petersen, et al., 2011), the coupling specificity via PRED-COUPLE2 (Sgourakis, et al., 2005), the intra- and/or extra- cellular domain profiles accompanied by their respective limits, and, finally, the predicted topology provided by the HMM. On the top of the Result page the text Output file for all the query sequences is provided

Figure 2. The Result Page of GPCRpipe

How GPCRpipe works?

GPCRpipe consists of two layers (Figure 3). The first layer is the GPCRs' detection, which is the main feature of this pipeline. Two methods are used for the detection of GPCRs in a set of proteins: (a) a HMM especially designed for the detection of GPCRs (Figure 4), which is a modified version of the model used by the HMM-TM algorithm (Bagos, et al., 2006) and (b) a library which consists of 35 PFAM pHMMs (Table 1) which are specific for different families of GPCRs (Punta, et al., 2012). GPCRs are considered all the proteins that are predicted by both methods and only sequences with length greater than 200 amino acids may be detected, as such. The second layer of this pipeline provides features annotation for every predicted GPCR with the use of preexisting tools.

Figure 3. Workflow of the GPCRpipe, consisting of two layers, GPCR detection and Feature Annotation

The model that we used is cyclic, consisting of 114 states, including begin (B) and end (E) states. The model consists of three "submodels" corresponding to the three desired labels to predict, the TM (transmembrane) helix sub-model and the inner and outer loops sub-models respectively. The TM helix model incorporates states to model the architecture of the transmembrane helices. Thus, there are states that correspond to the core of the helix and the cap located at the lipid bilayer interface. All states are connected with the appropriate transition probabilities in order to be consistent with the known structures, that is, to ensure appropriate length distribution. The inner and outer loops are modeled with a "ladder" architecture, at the top each is a self transitioning state corresponding to residues too distant from the membrane; these cannot be modeled as loops, hence that state is named "globular". The model was trained using the Baum-Welch algorithm for labeled sequences (Krogh, 1994) and the decoding was performed using the Optimal Accuracy Posterior Decoder (Käll, et al., 2005).

Figure 4. A schematic representation of the model's architecture. The model consists of three sub-models denoted by the labels: Cytoplasmic loop, Transmembrane Helix and Extracellular loop. Within each sub-model, states with the same shape, size and colour are sharing the same emission probabilities (parameter tying). Allowed transitions are indicated with arrows.

Table 1. The 35 Pfam Families (pHMMs) consisting the pHMM library used in GPCRpipe.

Pfam ID	Pfam Accession	Pfam ID	Pfam Accession
7tm_1	PF00001	DUF1182	PF06681
7tm_2	PF00002	DUF621	PF04789
7tm_3	PF00003	Frizzled	PF01534
7TM_GPCR_Srab	PF10292	Git3	PF11710
7TM_GPCR_Sra	PF02117	GpcrRhopsn4	PF10192
7TM_GPCR_Srbc	PF10316	Lung_7-TM_R	PF06814
7TM_GPCR_Srb	PF02175	Ocular_alb	PF02101
7TM_GPCR_Srd	PF10317	Serpentine_r_xa	PF03383
7TM_GPCR_Srh	PF10318	Sre	PF03125
7TM_GPCR_Sri	PF10327	Srg	PF02118
7TM_GPCR_Srj	PF10319	STE2	PF02116
7TM_GPCR_Srsx	PF10320	STE3	PF02076
7TM_GPCR_Srt	PF10321	TAS2R	PF05296
7TM_GPCR_Sru	PF10322	V1R	PF03402
7TM_GPCR_Srv	PF10323	Dicty_CAR	PF05462
7TM_GPCR_Srw	PF10324	7tm_4	PF13853
7TM_GPCR_Srx	PF10328
7TM_GPCR_Srz	PF10325
7TM_GPCR_Str	PF10326

In the feature annotation phase, further information is provided for each predicted GPCR:

1. Family annotation using the best hit pHMM from Pfam library (Punta, et al., 2012)(Table 1)

2. Coupling specificity to certain families of G-proteins using PRED-COUPLE2 algorithm (Sgourakis, et al., 2005)

3. Topology provided by (a) the HMM, (b) the occurrence of Pfam profiles with known extra- or intra-cellular position and (c) the signal peptide's position provided by SignalP 4.0 (Petersen, et al., 2011)

The Pfam profiles with known extra- or intra- cellular position were manual gathered. Tables 2 and 3 list the 153 extracellular and the 104 intracellular domain profiles, respectively.

Table 2. The 153 extracellular Pfam profiles used in GPCRpipe.

Pfam ID	Pfam Accession	Pfam ID	Pfam Accession
Beta_HSD	PF01073	TarH	PF02203
ANF_receptor	PF01094	Thyroglobulin_1	PF00086
Abhydrolase_3	PF07859	Tissue_fac	PF01108
Activin_recp	PF01064	Tme5_EGF_like	PF09064
AhpC-TSA	PF00578	V-set_CD47	PF08204
Alpha-mann_mid	PF09261	YscJ_FliF_C	PF08345
Amidase_2	PF01510	YycI	PF09648
C1-set	PF07654	adh_short	PF00106
C2-set_2	PF08205	Asp	PF00026
C2-set	PF05790	Branch	PF02485
CD4-extracel	PF09191	Cadherin	PF00028
Cache_1	PF02743	Calreticulin	PF00262
Cadherin_2	PF08266	Carb_anhydrase	PF00194
Cadherin_pro	PF08758	CIMR	PF00878
CtaG_Cox11	PF04442	CN_hydrolase	PF00795
Cu-oxidase_2	PF07731	COesterase	PF00135
Cu-oxidase_3	PF07732	COX2	PF00116
DPPIV_N	PF00930	Cu-oxidase	PF00394
DUF1968	PF09291	CUB	PF00431
DUF1986	PF09342	Cytochrom_C	PF00034
DotA	PF11388	DSL	PF01414
DsbC	PF11412	EGF_2	PF07974
Duffy_binding	PF05424	EGF	PF00008
EGF_CA	PF07645	Ephrin	PF00812
Ephrin_lbd	PF01404	fn2	PF00040
Epimerase	PF01370	Fringe	PF02434
EpoR_lig-bind	PF09067	Furin-like	PF00757
Evr1_Alr	PF04777	Fz	PF01392
F5_F8_type_C	PF00754	Glycos_transf_2	PF00535
FAD_binding_8	PF08022	Hemopexin	PF00045
FG-GAP	PF01839	HlyD	PF00529
FecR	PF04773	HRM	PF02793
FtsQ	PF03799	ICAM_N	PF03921
GDA1_CD39	PF01150	ig	PF00047
GSPII_G	PF08334	Integrin_beta	PF00362
GSPII_IJ	PF02501	KR	PF08659
Glyco_hydro_31	PF01055	Ldl_recept_a	PF00057
Glyco_hydro_38C	PF07748	Ldl_recept_b	PF00058
Glyco_hydro_38	PF01074	Lectin_C	PF00059
Glyco_hydro_47	PF01532	LRR_1	PF00560
Glyco_transf_15	PF01793	LRRNT	PF01462
Glyco_transf_43	PF03360	Malectin	PF11721
Glyco_transf_64	PF09258	MAM	PF00629
Glyco_transf_6	PF03414	MHC_I	PF00129
GspJ	PF11612	MHC_II_alpha	PF00993
GspK	PF03934	MHC_II_beta	PF00969
Hema_HEFG	PF02710	NOD	PF06816
Hema_esterase	PF03996	NODP	PF07684
Hepsin-SRCR	PF09272	OmpA	PF00691
Herpes_glycop_D	PF01537	PD40	PF07676
I-set	PF07679	Peptidase_M10	PF00413
IL6Ra-bind	PF09240	Peptidase_S10	PF00450
Ig_Tie2_1	PF10430	Peptidase_S24	PF00717
Integrin_alpha2	PF08441	Peptidase_S9	PF00326
Interfer-bind	PF09294	PKD	PF00801
Lectin_leg-like	PF03388	POTRA_1	PF08478
Lep_receptor_Ig	PF06328	PSI	PF01437
MHCassoc_trimer	PF08831	Redoxin	PF08534
NAD_binding_4	PF07993	Rieske	PF00355
NCD3G	PF07562	Sema	PF01403
PBP_dimer	PF03717	SPOR	PF05036
PTP_N	PF12453	SRCR	PF00530
Peptidase_M13_N	PF05649	Sushi	PF00084
Peptidase_M13	PF01431	TIG	PF01833
Peptidase_M23	PF01551	TNF	PF00229
Peptidase_S26	PF10502	TNFR_c6	PF00020
Recep_L_domain	PF01030	TolA	PF06519
Receptor_2B4	PF11465	TonB	PF03544
Receptor_IA-2	PF11548	Transgly	PF00912
Reprolysin	PF01421	Transpeptidase	PF00905
Rhodopsin_N	PF10413	Trefoil	PF00088
Rib_hydrolayse	PF02267	Trypsin	PF00089
Ricin_B_lectin	PF00652	TSP_1	PF00090
RmlD_sub_bind	PF04321	V-set	PF07686
SEC-C	PF02810	VWA	PF00092
Somatomedin_B	PF01033	Xlink	PF00193
Sulfotransfer_1	PF00685

Table 3. The 104 intracellular Pfam profiles used in GPCRpipe.

Pfam ID	Pfam Accession	Pfam ID	Pfam Accession
ATP_Ca_trans_C	PF12424	Latrophilin	PF02354
AlaDh_PNT_N	PF05222	LBR_tudor	PF09465
CagE_TrbE_VirB	PF03135	LEM	PF03020
GluR_Homer-bdg	PF10606	MCPsignal	PF00015
H-K_ATPase_N	PF09040	MgtE_N	PF03448
KcnmB2_inactiv	PF09303	MHC_I_C	PF06623
Syntaxin-6_N	PF09177	Mito_carr	PF00153
TGF_beta_GS	PF08515	Myb_DNA-binding	PF00249
Tcell_CD4_Cterm	PF12104	NC	PF04970
TrwB_AAD_bind	PF10412	Nyv1_N	PF09426
A_deaminase	PF00962	Osmo_CC	PF08946
AA_permease_N	PF08403	PAS_3	PF08447
AAA_5	PF07728	PAS_4	PF08448
AAA	PF00004	PAS	PF00989
ABC_ATPase	PF09818	PDZ	PF00595
ABC_tran	PF00005	Peptidase_M41	PF01434
AlaDh_PNT_C	PF01262	Pkinase	PF00069
Alpha_kinase	PF02816	Pkinase_Tyr	PF07714
Ank	PF00023	Plexin_cytopl	PF08337
BAG	PF02179	Potassium_chann	PF11404
Band_3_cyto	PF07565	PP2C	PF00481
Bcl-2	PF00452	PTS_EIIA_1	PF00358
C2	PF00168	PTS_EIIA_2	PF00359
Ca_chan_IQ	PF08763	PTS_EIIB	PF00367
Calx-beta	PF03160	PTS_IIB	PF02302
CbiA	PF01656	RcsC	PF09456
CBS	PF00571	Response_reg	PF00072
cNMP_binding	PF00027	Ribonuc_2-5A	PF06479
Connexin43	PF03508	RseA_N	PF03872
CUE	PF02845	RyR	PF02026
Death	PF00531	Sed5p	PF11416
DUF1856	PF08983	SH3_1	PF00018
DUF2404	PF10296	SH3_2	PF07653
DUF3131	PF11329	Shal-type	PF11601
FHA	PF00498	SNARE	PF05739
Flavi_NS5	PF00972	SNN_cytoplasm	PF09051
GT36_AF	PF06205	STAS	PF01740
Guanylate_cyc	PF00211	Syndecan	PF01034
HATPase_c	PF02518	Syntaxin	PF00804
HisKA	PF00512	TIR	PF01582
HMA	PF00403	TPR_1	PF00515
HMG-CoA_red	PF00368	TPR_2	PF07719
Homeobox	PF00046	TraG	PF02534
Hpt	PF01627	TrkA_N	PF02254
Hydrolase	PF00702	ubiquitin	PF00240
Integrin_alpha	PF00357	UQ_con	PF00179
ITAM	PF02189	V-SNARE	PF05008
K_tetra	PF02214	V-SNARE_C	PF12352
KCNQ_channel	PF03520	Y_phosphatase	PF00102
Kdo	PF06293	zf-CDGSH	PF09360
KdpD	PF02702	zf-C3HC4	PF00097
Kv2channel	PF03521	ZipA_C	PF04354

Bagos, P.G., Liakopoulos, T.D. and Hamodrakas, S.J. (2006) Algorithms for incorporating prior topological information in HMMs: application to transmembrane proteins, BMC Bioinformatics, 7, 189.

Bjarnadóttir, T.K., et al. (2006) Comprehensive repertoire and phylogenetic analysis of the G protein-coupled receptors in human and mouse, Genomics, 88, 263-273.

Käll, L., Krogh, A. and Sonnhammer, E.L. (2005) An HMM posterior decoder for sequence feature prediction that includes homology information, Bioinformatics, 21 Suppl 1, i251-257.

Kristiansen, K. (2004) Molecular mechanisms of ligand binding, signaling, and regulation within the superfamily of G-protein-coupled receptors: molecular modeling and mutagenesis approaches to receptor structure and function, Pharmacology & Therapeutics, 103, 21-80.

Krogh, A. (1994) Hidden Markov models for labelled sequences. Proceedings of the12th IAPR International Conference on Pattern Recognition. pp. 140-144.

Oldham, W.M. and Hamm, H.E. (2008) Heterotrimeric G protein activation by G-protein-coupled receptors, Nature Reviews Molecular Cell Biology, 9, 60-71.

Petersen, T.N., et al. (2011) SignalP 4.0: discriminating signal peptides from transmembrane regions, Nat Methods, 8, 785-786.

Punta, M., et al. (2012) The Pfam protein families database, Nucleic Acids Research, 40, D290-301.

Sgourakis, N.G., Bagos, P.G. and Hamodrakas, S.J. (2005) Prediction of the coupling specificity of GPCRs to four families of G-proteins using hidden Markov models and artificial neural networks, Bioinformatics, 21, 4101-4106.