####

PredSL combines several methods in order to predict a protein's localization to the chloroplast and the thylakoids, the mitochondrion and the secretory pathway.

As input PredSL requires the protein's sequence in fasta format.

The algorithm cosists of 10 steps:

STEP 1:

Initially the 100 N-terminal residues of the sequence are coded (see supplementary material-neural network training) and fed to
a first layer of 2 neural networks which determine whether a residue belongs or not to a chloroplast transit peptide (cTP) or a
mitochondrial transit peptide (mTP). From this step we get 100 scores (one per residue) from each network.

STEP 2:

We have set a cutoff where the residues do not belong to a transit peptide any longer, and thus we calculate two approximate cleavage sites from the 100 scores we calculated in step 1.(One from each network)

STEP 3:

We take a window of 40 positions around the approximate cleavage site we estimated in step 2, and we use a set of neural networks to predict the cleavage site. Therefore we have
one prediction of the cleavage site of the hypothetical cTP and one for the mTP.

STEP 3:

We calculate the average of the scores of the hypothetical peptides predicted from each network, and this results to two scores.

STEP 4:

We feed the 100 scores from step 1 to two neural networks (one for the cTP and one for the mTP), and we get two more scores. These scores represent the probability
that the sequence has an mTP or a cTP.

STEP 5:

We use PrediSi to calculate one more score for each sequence. This score represents the probability of a sequence belonging to a secreted protein.

STEP 6:

We use a program that uses Markov chains to discriminate between two categories to get 6 more scores for the plant proteins and 3 more for the nonplant. (See supplementary material-Markov chains)

STEP 7:

We use HMMER to get two additional scores for each protein. One that shows the existence or not of a cTP and one that shows the existence or not of an mTP.

STEP 8:

We feed all the scores we gathered (13 for the plant and 7 for the non-plant proteins) to a neural network that does the final prediction.

STEP 9:

Finally, if a sequence is predicted to belong to a chloroplast protein, we use HMMER to determine the existence of a lumenal-transit peptide (lTP)

STEP 10:

If the user requires it, PredSL provides the possibility to make a graph for each case. The graphs for the chloroplast and mitochondrial sequences are
created using the scores from Step 1 and taking a window around the predicted cleavage site from Step 3.
For the secreted proteins, the graphs are created usind the hydrophobicity index (Kytte-Doolittle, 1982) for a window around the predicted cleavage site from PrediSi.

As input PredSL requires the protein's sequence in fasta format.

The algorithm cosists of 10 steps:

STEP 1:

Initially the 100 N-terminal residues of the sequence are coded (see supplementary material-neural network training) and fed to a first layer of 2 neural networks which determine whether a residue belongs or not to a chloroplast transit peptide (cTP) or a mitochondrial transit peptide (mTP). From this step we get 100 scores (one per residue) from each network.

STEP 2:

We have set a cutoff where the residues do not belong to a transit peptide any longer, and thus we calculate two approximate cleavage sites from the 100 scores we calculated in step 1.(One from each network)

STEP 3:

We take a window of 40 positions around the approximate cleavage site we estimated in step 2, and we use a set of neural networks to predict the cleavage site. Therefore we have one prediction of the cleavage site of the hypothetical cTP and one for the mTP.

STEP 3:

We calculate the average of the scores of the hypothetical peptides predicted from each network, and this results to two scores.

STEP 4:

We feed the 100 scores from step 1 to two neural networks (one for the cTP and one for the mTP), and we get two more scores. These scores represent the probability that the sequence has an mTP or a cTP.

STEP 5:

We use PrediSi to calculate one more score for each sequence. This score represents the probability of a sequence belonging to a secreted protein.

STEP 6:

We use a program that uses Markov chains to discriminate between two categories to get 6 more scores for the plant proteins and 3 more for the nonplant. (See supplementary material-Markov chains)

STEP 7:

We use HMMER to get two additional scores for each protein. One that shows the existence or not of a cTP and one that shows the existence or not of an mTP.

STEP 8:

We feed all the scores we gathered (13 for the plant and 7 for the non-plant proteins) to a neural network that does the final prediction.

STEP 9:

Finally, if a sequence is predicted to belong to a chloroplast protein, we use HMMER to determine the existence of a lumenal-transit peptide (lTP)

STEP 10:

If the user requires it, PredSL provides the possibility to make a graph for each case. The graphs for the chloroplast and mitochondrial sequences are created using the scores from Step 1 and taking a window around the predicted cleavage site from Step 3. For the secreted proteins, the graphs are created usind the hydrophobicity index (Kytte-Doolittle, 1982) for a window around the predicted cleavage site from PrediSi.