(in order to check whether a residue belongs or not to a transit peptide of chloroplast proteins (cTP) or mitochondrial proteins (mTP) respectively)
Input units
For the N-terminal 100 amino-acid residues of each sequence, we used windows of 55, and 35 residues for the neural networks A and B respectively.
Each residue was represented by 20 nodes of which only the one corresponding to the appropriate amino acid was switched on, whereas the rest remained off.
Example: Ala: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Hidden units : 4
Epochs of training : 200
Learning Rate : 0.01
Output units : 1 for the positive sets of inputs, 0 for the negative.
(in order to check whether the sequence has a cTP or an mTP)
Input units
The scores from the first neural networks for the N-terminal 100 amino-acid residues of each sequence.
Hidden units : 6
Epochs of training : 200
Learning Rate : 0.01
Output units : 1 for the positive sets of inputs, 0 for the negative.
(for the prediction of the cleavage site of the cTP and mTP respectively)
Location
Plant
Non-plant
Chloroplast
122
-
Mitochondrion
91
241
The training dataset, as before, consisted of equal numbers of positive and negative sequences. In order to make the method more objective, we trained 5 different networks for each case (cTP and mTP), and we use the average prediction score.
The architecture of the neural networks is:
Input units
For the positive sequences we use a window of 27 residues (-20 +6) around the annotated cleavage site for the a1-5 neural networks and 21 residues (-12 +8) for the b1-5 networks.
For the negative sequences, we use similar windows around random positions in the sequences. Again each residue is represented by 20 nodes as in the first set of neural networks.
Hidden units : 9
Epochs of training : 200
Learning Rate : 0.01
Output units : 1 for the positive sets of inputs, 0 for the negative.
(in order to make the prediction)
As training set we used equal numbers of each category. The architecture of the neural networks was:
Input units
The scores from all the methods used: 13 scores for the plant sequences and 7 scores for the nonplant sequences.
Hidden units : 6 scores for the plant sequences, 5 for the nonplant sequences.
Epochs of training : 200
Learning Rate : 0.01
Output units : for the plant sequences: 1 0 0 for the inputs of chloroplasmic proteins,
0 1 0 for the inputs of mitochondrial proteins,
0 0 1 for the inputs of secreted proteins and
0 0 0 for the inputs of the cytoplasmic and nuclear sets (other).
for the nonplant sequences: 1 0 for the inputs of mitochondrial proteins,
0 1 for the inputs of secreted proteins and
0 0 for the inputs of the cytoplasmic and nuclear sets (other).