!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> GPCRpipe

 

About GPCRpipe

Basic Theory

G-protein coupled receptors represent the largest and most diverse superfamily of transmembrane receptors in eykaryotic cells, with nearly 800 genes encoding such receptors in the human genome (Bjarnadttir, et al., 2006). At the interface between the extracellular and intracellular milieu, they are involved in the regulation of nearly every physiological process by converting extracellular stimuli into intracellular responses. Most GPCRs functions are performed by a special group of proteins called G-proteins. G-proteins act as switches for transducting messages from the extracellular space inside the cell, through their interaction with GPCRs (Kristiansen, 2004). G-proteins interact with various effector molecules to immediately change the concentrations of cellular molecules leading eventually to a wide range of cellular and physiological responses (Oldham and Hamm, 2008).

All known members of the GPCR superfamily share common topology: 7 transmembrane helices, 3 intra- and 3 extra-cellular loops, extracellular N-terminal and intracellular C-terminal. Despite their common architecture and function, GPCRs show important diversity at sequence level. This lack of sequence similarity makes difficult the detection of GPCRs in proteomes, especially, the finding of novel members of the GPCR superfamily.

Here we present GPCRpipe, a pipeline for the accurate detection of GPCRs in proteomes, based on previously published tools and a HMM model especially designed to detect GPCRs.



How to run GPCRpipe?

It is very easy to use GPCRpipe. The user should go to the Run page of the tool and insert in the text box one or more FASTA sequences (up to 100 per run) or upload a text file containing the query sequences, then choose one of the two available methods (AND/OR) using the respective radio button, and, finaly, press the "Submit" button (Figure 1).

Figure 1. The Run page of GPCRpipe


In a couple of minutes the Result page is return to the user (Figure 2). Here the user may be informed if the submitted sequences are GPCRs or not. For each predicted GPCR, further information is provided. Specifically, the HMM Reliability Score, the GPCR specific Pfam profile accompanied by its limits, the predicted signal peptide via SignalP 4.0 (Petersen, et al., 2011), the coupling specificity via PRED-COUPLE2 (Sgourakis, et al., 2005), the intra- and/or extra- cellular domain profiles accompanied by their respective limits, and, finally, the predicted topology provided by the HMM. On the top of the Result page the text Output file for all the query sequences is provided

Figure 2. The Result Page of GPCRpipe



How GPCRpipe works?

GPCRpipe consists of two layers (Figure 3). The first layer is the GPCRs' detection, which is the main feature of this pipeline. Two methods are used for the detection of GPCRs in a set of proteins: (a) a HMM especially designed for the detection of GPCRs (Figure 4), which is a modified version of the model used by the HMM-TM algorithm (Bagos, et al., 2006) and (b) a library which consists of 35 PFAM pHMMs (Table 1) which are specific for different families of GPCRs (Punta, et al., 2012). GPCRs are considered all the proteins that are predicted by both methods and only sequences with length greater than 200 amino acids may be detected, as such. The second layer of this pipeline provides features annotation for every predicted GPCR with the use of preexisting tools.

Figure 3. Workflow of the GPCRpipe, consisting of two layers, GPCR detection and Feature Annotation

The model that we used is cyclic, consisting of 114 states, including begin (B) and end (E) states. The model consists of three "submodels" corresponding to the three desired labels to predict, the TM (transmembrane) helix sub-model and the inner and outer loops sub-models respectively. The TM helix model incorporates states to model the architecture of the transmembrane helices. Thus, there are states that correspond to the core of the helix and the cap located at the lipid bilayer interface. All states are connected with the appropriate transition probabilities in order to be consistent with the known structures, that is, to ensure appropriate length distribution. The inner and outer loops are modeled with a "ladder" architecture, at the top each is a self transitioning state corresponding to residues too distant from the membrane; these cannot be modeled as loops, hence that state is named "globular". The model was trained using the Baum-Welch algorithm for labeled sequences (Krogh, 1994) and the decoding was performed using the Optimal Accuracy Posterior Decoder (Käll, et al., 2005).

Figure 4. A schematic representation of the model's architecture. The model consists of three sub-models denoted by the labels: Cytoplasmic loop, Transmembrane Helix and Extracellular loop. Within each sub-model, states with the same shape, size and colour are sharing the same emission probabilities (parameter tying). Allowed transitions are indicated with arrows.



Table 1. The 35 Pfam Families (pHMMs) consisting the pHMM library used in GPCRpipe.

Pfam ID

Pfam Accession

Pfam ID

Pfam Accession

7tm_1

PF00001

DUF1182

PF06681

7tm_2

PF00002

DUF621

PF04789

7tm_3

PF00003

Frizzled

PF01534

7TM_GPCR_Srab

PF10292

Git3

PF11710

7TM_GPCR_Sra

PF02117

GpcrRhopsn4

PF10192

7TM_GPCR_Srbc

PF10316

Lung_7-TM_R

PF06814

7TM_GPCR_Srb

PF02175

Ocular_alb

PF02101

7TM_GPCR_Srd

PF10317

Serpentine_r_xa

PF03383

7TM_GPCR_Srh

PF10318

Sre

PF03125

7TM_GPCR_Sri

PF10327

Srg

PF02118

7TM_GPCR_Srj

PF10319

STE2

PF02116

7TM_GPCR_Srsx

PF10320

STE3

PF02076

7TM_GPCR_Srt

PF10321

TAS2R

PF05296

7TM_GPCR_Sru

PF10322

V1R

PF03402

7TM_GPCR_Srv

PF10323

Dicty_CAR

PF05462

7TM_GPCR_Srw

PF10324

7tm_4

PF13853

7TM_GPCR_Srx

PF10328

7TM_GPCR_Srz

PF10325

7TM_GPCR_Str

PF10326



In the feature annotation phase, further information is provided for each predicted GPCR:

1. Family annotation using the best hit pHMM from Pfam library (Punta, et al., 2012)(Table 1)

2. Coupling specificity to certain families of G-proteins using PRED-COUPLE2 algorithm (Sgourakis, et al., 2005)

3. Topology provided by (a) the HMM, (b) the occurrence of Pfam profiles with known extra- or intra-cellular position and (c) the signal peptide's position provided by SignalP 4.0 (Petersen, et al., 2011)

The Pfam profiles with known extra- or intra- cellular position were manual gathered. Tables 2 and 3 list the 153 extracellular and the 104 intracellular domain profiles, respectively.



Table 2. The 153 extracellular Pfam profiles used in GPCRpipe.

Pfam ID

Pfam Accession

Pfam ID

Pfam Accession

Beta_HSD

PF01073

TarH

PF02203

ANF_receptor

PF01094

Thyroglobulin_1

PF00086

Abhydrolase_3

PF07859

Tissue_fac

PF01108

Activin_recp

PF01064

Tme5_EGF_like

PF09064

AhpC-TSA

PF00578

V-set_CD47

PF08204

Alpha-mann_mid

PF09261

YscJ_FliF_C

PF08345

Amidase_2

PF01510

YycI

PF09648

C1-set

PF07654

adh_short

PF00106

C2-set_2

PF08205

Asp

PF00026

C2-set

PF05790

Branch

PF02485

CD4-extracel

PF09191

Cadherin

PF00028

Cache_1

PF02743

Calreticulin

PF00262

Cadherin_2

PF08266

Carb_anhydrase

PF00194

Cadherin_pro

PF08758

CIMR

PF00878

CtaG_Cox11

PF04442

CN_hydrolase

PF00795

Cu-oxidase_2

PF07731

COesterase

PF00135

Cu-oxidase_3

PF07732

COX2

PF00116

DPPIV_N

PF00930

Cu-oxidase

PF00394

DUF1968

PF09291

CUB

PF00431

DUF1986

PF09342

Cytochrom_C

PF00034

DotA

PF11388

DSL

PF01414

DsbC

PF11412

EGF_2

PF07974

Duffy_binding

PF05424

EGF

PF00008

EGF_CA

PF07645

Ephrin

PF00812

Ephrin_lbd

PF01404

fn2

PF00040

Epimerase

PF01370

Fringe

PF02434

EpoR_lig-bind

PF09067

Furin-like

PF00757

Evr1_Alr

PF04777

Fz

PF01392

F5_F8_type_C

PF00754

Glycos_transf_2

PF00535

FAD_binding_8

PF08022

Hemopexin

PF00045

FG-GAP

PF01839

HlyD

PF00529

FecR

PF04773

HRM

PF02793

FtsQ

PF03799

ICAM_N

PF03921

GDA1_CD39

PF01150

ig

PF00047

GSPII_G

PF08334

Integrin_beta

PF00362

GSPII_IJ

PF02501

KR

PF08659

Glyco_hydro_31

PF01055

Ldl_recept_a

PF00057

Glyco_hydro_38C

PF07748

Ldl_recept_b

PF00058

Glyco_hydro_38

PF01074

Lectin_C

PF00059

Glyco_hydro_47

PF01532

LRR_1

PF00560

Glyco_transf_15

PF01793

LRRNT

PF01462

Glyco_transf_43

PF03360

Malectin

PF11721

Glyco_transf_64

PF09258

MAM

PF00629

Glyco_transf_6

PF03414

MHC_I

PF00129

GspJ

PF11612

MHC_II_alpha

PF00993

GspK

PF03934

MHC_II_beta

PF00969

Hema_HEFG

PF02710

NOD

PF06816

Hema_esterase

PF03996

NODP

PF07684

Hepsin-SRCR

PF09272

OmpA

PF00691

Herpes_glycop_D

PF01537

PD40

PF07676

I-set

PF07679

Peptidase_M10

PF00413

IL6Ra-bind

PF09240

Peptidase_S10

PF00450

Ig_Tie2_1

PF10430

Peptidase_S24

PF00717

Integrin_alpha2

PF08441

Peptidase_S9

PF00326

Interfer-bind

PF09294

PKD

PF00801

Lectin_leg-like

PF03388

POTRA_1

PF08478

Lep_receptor_Ig

PF06328

PSI

PF01437

MHCassoc_trimer

PF08831

Redoxin

PF08534

NAD_binding_4

PF07993

Rieske

PF00355

NCD3G

PF07562

Sema

PF01403

PBP_dimer

PF03717

SPOR

PF05036

PTP_N

PF12453

SRCR

PF00530

Peptidase_M13_N

PF05649

Sushi

PF00084

Peptidase_M13

PF01431

TIG

PF01833

Peptidase_M23

PF01551

TNF

PF00229

Peptidase_S26

PF10502

TNFR_c6

PF00020

Recep_L_domain

PF01030

TolA

PF06519

Receptor_2B4

PF11465

TonB

PF03544

Receptor_IA-2

PF11548

Transgly

PF00912

Reprolysin

PF01421

Transpeptidase

PF00905

Rhodopsin_N

PF10413

Trefoil

PF00088

Rib_hydrolayse

PF02267

Trypsin

PF00089

Ricin_B_lectin

PF00652

TSP_1

PF00090

RmlD_sub_bind

PF04321

V-set

PF07686

SEC-C

PF02810

VWA

PF00092

Somatomedin_B

PF01033

Xlink

PF00193

Sulfotransfer_1

PF00685


Table 3. The 104 intracellular Pfam profiles used in GPCRpipe.

Pfam ID

Pfam Accession

Pfam ID

Pfam Accession

ATP_Ca_trans_C

PF12424

Latrophilin

PF02354

AlaDh_PNT_N

PF05222

LBR_tudor

PF09465

CagE_TrbE_VirB

PF03135

LEM

PF03020

GluR_Homer-bdg

PF10606

MCPsignal

PF00015

H-K_ATPase_N

PF09040

MgtE_N

PF03448

KcnmB2_inactiv

PF09303

MHC_I_C

PF06623

Syntaxin-6_N

PF09177

Mito_carr

PF00153

TGF_beta_GS

PF08515

Myb_DNA-binding

PF00249

Tcell_CD4_Cterm

PF12104

NC

PF04970

TrwB_AAD_bind

PF10412

Nyv1_N

PF09426

A_deaminase

PF00962

Osmo_CC

PF08946

AA_permease_N

PF08403

PAS_3

PF08447

AAA_5

PF07728

PAS_4

PF08448

AAA

PF00004

PAS

PF00989

ABC_ATPase

PF09818

PDZ

PF00595

ABC_tran

PF00005

Peptidase_M41

PF01434

AlaDh_PNT_C

PF01262

Pkinase

PF00069

Alpha_kinase

PF02816

Pkinase_Tyr

PF07714

Ank

PF00023

Plexin_cytopl

PF08337

BAG

PF02179

Potassium_chann

PF11404

Band_3_cyto

PF07565

PP2C

PF00481

Bcl-2

PF00452

PTS_EIIA_1

PF00358

C2

PF00168

PTS_EIIA_2

PF00359

Ca_chan_IQ

PF08763

PTS_EIIB

PF00367

Calx-beta

PF03160

PTS_IIB

PF02302

CbiA

PF01656

RcsC

PF09456

CBS

PF00571

Response_reg

PF00072

cNMP_binding

PF00027

Ribonuc_2-5A

PF06479

Connexin43

PF03508

RseA_N

PF03872

CUE

PF02845

RyR

PF02026

Death

PF00531

Sed5p

PF11416

DUF1856

PF08983

SH3_1

PF00018

DUF2404

PF10296

SH3_2

PF07653

DUF3131

PF11329

Shal-type

PF11601

FHA

PF00498

SNARE

PF05739

Flavi_NS5

PF00972

SNN_cytoplasm

PF09051

GT36_AF

PF06205

STAS

PF01740

Guanylate_cyc

PF00211

Syndecan

PF01034

HATPase_c

PF02518

Syntaxin

PF00804

HisKA

PF00512

TIR

PF01582

HMA

PF00403

TPR_1

PF00515

HMG-CoA_red

PF00368

TPR_2

PF07719

Homeobox

PF00046

TraG

PF02534

Hpt

PF01627

TrkA_N

PF02254

Hydrolase

PF00702

ubiquitin

PF00240

Integrin_alpha

PF00357

UQ_con

PF00179

ITAM

PF02189

V-SNARE

PF05008

K_tetra

PF02214

V-SNARE_C

PF12352

KCNQ_channel

PF03520

Y_phosphatase

PF00102

Kdo

PF06293

zf-CDGSH

PF09360

KdpD

PF02702

zf-C3HC4

PF00097

Kv2channel

PF03521

ZipA_C

PF04354



Bagos, P.G., Liakopoulos, T.D. and Hamodrakas, S.J. (2006) Algorithms for incorporating prior topological information in HMMs: application to transmembrane proteins, BMC Bioinformatics, 7, 189.

Bjarnadóttir, T.K., et al. (2006) Comprehensive repertoire and phylogenetic analysis of the G protein-coupled receptors in human and mouse, Genomics, 88, 263-273.

Käll, L., Krogh, A. and Sonnhammer, E.L. (2005) An HMM posterior decoder for sequence feature prediction that includes homology information, Bioinformatics, 21 Suppl 1, i251-257.

Kristiansen, K. (2004) Molecular mechanisms of ligand binding, signaling, and regulation within the superfamily of G-protein-coupled receptors: molecular modeling and mutagenesis approaches to receptor structure and function, Pharmacology & Therapeutics, 103, 21-80.

Krogh, A. (1994) Hidden Markov models for labelled sequences. Proceedings of the12th IAPR International Conference on Pattern Recognition. pp. 140-144.

Oldham, W.M. and Hamm, H.E. (2008) Heterotrimeric G protein activation by G-protein-coupled receptors, Nature Reviews Molecular Cell Biology, 9, 60-71.

Petersen, T.N., et al. (2011) SignalP 4.0: discriminating signal peptides from transmembrane regions, Nat Methods, 8, 785-786.

Punta, M., et al. (2012) The Pfam protein families database, Nucleic Acids Research, 40, D290-301.

Sgourakis, N.G., Bagos, P.G. and Hamodrakas, S.J. (2005) Prediction of the coupling specificity of GPCRs to four families of G-proteins using hidden Markov models and artificial neural networks, Bioinformatics, 21, 4101-4106.


University of Athens
Faculty of Biology
Biophysics & Bioinformatics Laboratory